SuperCLUETranslation site

1yrs agoupdate 892,755 0 30.8K

SuperCLUE是由CLUE学术社区推出的中文通用大模型综合性评测基准，旨在从基础能力、专业能力和中文特性能力三个维度全面评估模型表现。

Location:

China

Language:

CN

Collection time:

2025-05-20

Open site Mobile view

Model Evaluation # AI模型基准 # AI模型排行榜 # AI模型评测 # CLUE学术社区 # SuperCLUE # 专业能力评测 # 中文大模型评测 # 中文特性评测 # 基础能力评测

SuperCLUE

SuperCLUE

在AI技术飞速发展的今天，如何客观、公正地评估中文大模型的性能，成为业界关注的焦点。SuperCLUE，由CLUE学术社区于2023年5月推出，正是为了解决这一问题而生的综合性评测基准。

网站介绍

SuperCLUE全称为“中文通用大模型综合性测评基准”，旨在全面评估中文大模型在多维度的表现。通过定期更新的排行榜，SuperCLUE为研究者和开发者提供了直观的模型性能对比，助力技术进步。

功能特点

SuperCLUE的评测体系涵盖以下三个主要维度：

基础能力：评估模型在语义理解、对话、逻辑推理、角色扮演、代码生成、创作等10项核心能力上的表现。
专业能力：通过中学、大学及专业考试题目，覆盖数学、物理、地理、社会科学等50多个学科，测试模型的专业知识深度。
中文特性能力：针对中文特有任务，如成语解析、诗歌创作、文学理解、字形识别等10项能力，评估模型对中文文化的理解和应用能力。

相关项目

SuperCLUE还推出了以下评测体系：

SuperCLUE-OPEN：多轮开放式问答评测，考察模型在开放对话中的表现。
SuperCLUE-OPT：基于3700多道客观题的闭卷测试，全面评估模型的客观能力。
琅琊榜：匿名对战平台，通过用户投票，动态生成模型排名，反映实际应用中的表现。

优点评价

SuperCLUE的优势在于：

全面性：覆盖多维度、多领域的评测，确保评估结果的全面性。
动态更新：每月更新排行榜，紧跟模型发展动态，提供最新的评测结果。
开放性：支持国内外多种模型参与评测，促进技术交流与进步。

是否收费

SuperCLUE作为学术社区项目，评测结果公开透明，供研究者和开发者免费参考。

总结

对于00后和互联网用户而言，SuperCLUE提供了一个直观、权威的中文大模型评测平台，帮助用户了解各模型的性能差异，选择最适合自己需求的AI工具。无论是开发者还是普通用户，都能从中获益，推动中文AI技术的进步。

Relevant Navigation

Krea AI

KreaAI is an AI creative platform integrating real-time image generation, video production, image enhancement, and 3D object generation, designed to provide efficient and convenient creation tools for designers, artists, and creative professionals.

Evidently AI

Evidently AI is an open-source AI quality collaboration platform designed for evaluating, testing, and monitoring machine learning models, LLMs, and general AI applications. Its intuitive interface and rich visualization features enable users to promptly identify data drifts and anomalies, ensuring model performance stability.

C-Eval

C-Eval is a Chinese foundational model evaluation suite jointly developed by Shanghai Jiao Tong University, Tsinghua University, and the University of Edinburgh. It comprises 13,948 multiple-choice questions across 52 disciplines and four difficulty levels, aiming to comprehensively assess large language models' Chinese comprehension and reasoning abilities.

Chatbot Arena

Chatbot Arena是一个开放的社区驱动平台，用户通过匿名对战和投票，实时评估和比较大型语言模型（LLM）的性能。

PubMedQA

PubMedQA is a question-answering dataset tailored for the biomedical field, comprising 1,000 expert-labeled, 61,200 unlabeled, and 211,300 artificially generated QA instances, aiming to enhance AI models' performance in medical research question-answering tasks.

CMMLU

CMMLU是一个专为中文语境设计的综合性评估基准，涵盖67个主题，旨在全面测试语言模型的知识储备和推理能力。

Open LLM Leaderboard

Open LLM Leaderboard

Open LLM Leaderboard是由Hugging Face推出的开源大语言模型（LLM）评估平台，提供模型排名、性能评估和社区协作功能，助力开发者和研究者了解和比较不同LLM的表现。

Llama 3

Meta introduces Llama 3, offering models with 8B and 70B parameters, marking a significant advancement in open-source AI. Llama 3 builds upon its predecessors' strengths, delivering more efficient and reliable AI solutions through innovation and improvements.