AI 모델 벤치마크

AI 모델 벤치마크 C-Eval

A comprehensive evaluation suite designed to assess the knowledge and capabilities of large language models (LLMs) specifically in the Chinese language.

AI 모델 벤치마크 SuperCLUE

A professional evaluation framework providing standardized benchmarks to measure the intelligence and utility of Chinese-language AI 모델.

AI 모델 벤치마크 Open LLM Leaderboard

A comprehensive, community-driven benchmark platform by Hugging Face to track and compare the performance of open-source large language models.

AI 모델 벤치마크 CMMLU

A comprehensive evaluation benchmark designed to measure the general knowledge and linguistic capabilities of Large Language Models in Chinese.

AI 모델 벤치마크 PubMedQA

PubMedQA is a specialized biomedical question-answering dataset and leaderboard used to benchmark the accuracy of AI 모델 in the medical domain.