Análise e Otimização de IA

Benchmarks de modelos de IA MMBench

MMBench is a comprehensive evaluation framework designed to measure the capabilities of multimodal large language models across a wide array of visual and textual tasks.

Benchmarks de modelos de IA HELM

A standardized, holistic evaluation framework from Stanford University designed to measure the performance and safety of large language models.

Benchmarks de modelos de IA OpenCompass

OpenCompass is an open-source evaluation framework developed by the Shanghai AI Lab to provide standardized, comprehensive benchmarking for large language models.

Benchmarks de modelos de IA FlagEval

An open-source evaluation framework developed by the Beijing Academy of Artificial Intelligence (BAAI) to standardize and scale LLM benchmarking.

Benchmarks de modelos de IA LMArena

A crowdsourced benchmarking platform where users battle-test Large Language Models through blind side-by-side comparisons.

Benchmarks de modelos de IA MMLU

MMLU é um benchmark abrangente projetado para avaliar o conhecimento geral e as capacidades de resolução de problemas de grandes modelos de linguagem em uma vasta gama de disciplinas.

Benchmarks de modelos de IA C-Eval

A comprehensive evaluation suite designed to assess the knowledge and capabilities of large language models (LLMs) specifically in the Chinese language.

Benchmarks de modelos de IA SuperCLUE

A professional evaluation framework providing standardized benchmarks to measure the intelligence and utility of Chinese-language Modelos de IA.

Benchmarks de modelos de IA Open LLM Leaderboard

A comprehensive, community-driven benchmark platform by Hugging Face to track and compare the performance of open-source large language models.

Benchmarks de modelos de IA CMMLU

A comprehensive evaluation benchmark designed to measure the general knowledge and linguistic capabilities of Large Language Models in Chinese.