MMBench is a comprehensive evaluation framework designed to measure the capabilities of multimodal large language models across a wide array of visual and textual tasks.
OpenCompass is an open-source evaluation framework developed by the Shanghai AI Lab to provide standardized, comprehensive benchmarking for large language models.
MMLU è un benchmark completo progettato per valutare le capacità di conoscenza generale e di risoluzione dei problemi di modelli linguistici di grandi dimensioni in una vasta gamma di discipline.
A comprehensive evaluation suite designed to assess the knowledge and capabilities of large language models (LLMs) specifically in the Chinese language.
A professional evaluation framework providing standardized benchmarks to measure the intelligence and utility of Chinese-language Modelli di intelligenza artificiale.