Puntos de referencia de modelos de IA Evaluación C A comprehensive evaluation suite designed to assess the knowledge and capabilities of large language models (LLMs) specifically in the Chinese language.
Puntos de referencia de modelos de IA SuperCLUE A professional evaluation framework providing standardized benchmarks to measure the intelligence and utility of Chinese-language Modelos de IA.
Puntos de referencia de modelos de IA Clasificación abierta de LLM A comprehensive, community-driven benchmark platform by Hugging Face to track and compare the performance of open-source large language models.
Puntos de referencia de modelos de IA CMMLU A comprehensive evaluation benchmark designed to measure the general knowledge and linguistic capabilities of Large Language Models in Chinese.
Puntos de referencia de modelos de IA PubMedQA PubMedQA is a specialized biomedical question-answering dataset and leaderboard used to benchmark the accuracy of Modelos de IA in the medical domain.