Ai Model Benchmarks LLMEval3 A professional evaluation benchmark from Fudan University’s NLP Lab designed to measure the performance and reliability of large language models.
Ai Model Benchmarks C-Eval A comprehensive evaluation suite designed to assess the knowledge and capabilities of large language models (LLMs) specifically in the Chinese language.
Ai Model Benchmarks SuperCLUE A professional evaluation framework providing standardized benchmarks to measure the intelligence and utility of Chinese-language AI models.