AIToolsFly
  • AI Applications
    • Ai Agents
    • Ai Chatbots
    • Ai Document Tools
    • Ai Office Tools
    • Ai Presentation Tools
    • Ai Productivity Tools
    • Ai Search Engines
    • Ai Video Tools
    • Ai Writing Tools
  • AI Content Creation
    • Ai Audio Tools
    • Ai Design Tools
    • Ai Image Background Removers
    • Ai Image Generators
    • Ai Image Tools
  • AI Development
    • Ai Frameworks
    • Ai Models
    • Ai Programming Tools
    • Ai Prompt Tools
  • AI Analysis & Optimization
    • Ai Content Detection And Optimization Tools
    • Ai Model Benchmarks
  • AI Learning Resources
    • Websites To Learn Ai
  • AI Applications
    • Ai Agents
    • Ai Chatbots
    • Ai Document Tools
    • Ai Office Tools
    • Ai Presentation Tools
    • Ai Productivity Tools
    • Ai Search Engines
    • Ai Video Tools
    • Ai Writing Tools
  • AI Content Creation
    • Ai Audio Tools
    • Ai Design Tools
    • Ai Image Background Removers
    • Ai Image Generators
    • Ai Image Tools
  • AI Development
    • Ai Frameworks
    • Ai Models
    • Ai Programming Tools
    • Ai Prompt Tools
  • AI Analysis & Optimization
    • Ai Content Detection And Optimization Tools
    • Ai Model Benchmarks
  • AI Learning Resources
    • Websites To Learn Ai
  1. Home
  2. Tag
  3. LLM Evaluation
AGI-Eval

Ai Model Benchmarks AGI-Eval

AGI-Eval is a specialized evaluation community designed to benchmark the capabilities and performance of various AI large language models.

45 Views 0 Comments
Ai Model Benchmarks 2024年12月18日
H2O EvalGPT

Ai Model Benchmarks H2O EvalGPT

An advanced evaluation system by H2O.ai that utilizes Elo rating methodologies to benchmark and rank Large Language Models (LLMs).

62 Views 0 Comments
Ai Model Benchmarks 2023年10月29日
LLMEval3

Ai Model Benchmarks LLMEval3

A professional evaluation benchmark from Fudan University’s NLP Lab designed to measure the performance and reliability of large language models.

64 Views 0 Comments
Ai Model Benchmarks 2023年10月29日
HELM

Ai Model Benchmarks HELM

A standardized, holistic evaluation framework from Stanford University designed to measure the performance and safety of large language models.

103 Views 0 Comments
Ai Model Benchmarks 2023年10月29日
OpenCompass

Ai Model Benchmarks OpenCompass

OpenCompass is an open-source evaluation framework developed by the Shanghai AI Lab to provide standardized, comprehensive benchmarking for large language models.

78 Views 0 Comments
Ai Model Benchmarks 2023年10月29日
FlagEval

Ai Model Benchmarks FlagEval

An open-source evaluation framework developed by the Beijing Academy of Artificial Intelligence (BAAI) to standardize and scale LLM benchmarking.

89 Views 0 Comments
Ai Model Benchmarks 2023年10月29日
MMLU

Ai Model Benchmarks MMLU

MMLU is a comprehensive benchmark designed to evaluate the general knowledge and problem-solving capabilities of large language models across a vast array of disciplines.

82 Views 0 Comments
Ai Model Benchmarks 2023年10月29日
C-Eval

Ai Model Benchmarks C-Eval

A comprehensive evaluation suite designed to assess the knowledge and capabilities of large language models (LLMs) specifically in the Chinese language.

87 Views 0 Comments
Ai Model Benchmarks 2023年10月29日
SuperCLUE

Ai Model Benchmarks SuperCLUE

A professional evaluation framework providing standardized benchmarks to measure the intelligence and utility of Chinese-language AI models.

71 Views 0 Comments
Ai Model Benchmarks 2023年10月29日
CMMLU

Ai Model Benchmarks CMMLU

A comprehensive evaluation benchmark designed to measure the general knowledge and linguistic capabilities of Large Language Models in Chinese.

76 Views 0 Comments
Ai Model Benchmarks 2023年10月29日
  • 1
  • 2
  • »
关于我们

AIToolsFly is a curated directory of AI tools, productivity platforms, and digital resources. We help users quickly discover and compare the best tools across different categories.

版权说明

© 2026 AIToolsFly. All rights reserved. All content is for informational purposes only. Trademarks and product names belong to their respective owners.