OpenCompass is an open-source evaluation framework developed by the Shanghai AI Lab to provide standardized, comprehensive benchmarking for large language models.
MMLU é um benchmark abrangente projetado para avaliar o conhecimento geral e as capacidades de resolução de problemas de grandes modelos de linguagem em uma vasta gama de disciplinas.
A comprehensive evaluation suite designed to assess the knowledge and capabilities of large language models (LLMs) specifically in the Chinese language.