AI 모델 벤치마크

AI 모델 벤치마크 매직아레나

MagicArena는 시각적 생성형 AI 모델을 인간과의 직접적인 비교를 통해 평가하고 순위를 매기는 경쟁력 있는 벤치마킹 플랫폼입니다.

AI 모델 벤치마크 AGI-Eval

AGI-Eval은 다양한 AI 대규모 언어 모델의 기능과 성능을 벤치마킹하기 위해 설계된 전문 평가 커뮤니티입니다.

AI 모델 벤치마크 H2O EvalGPT

An advanced evaluation system by H2O.ai that utilizes Elo rating methodologies to benchmark and rank Large Language Models (LLMs).

AI 모델 벤치마크 LLMEval3

A professional evaluation benchmark from Fudan University’s NLP Lab designed to measure the performance and reliability of large language models.

AI 모델 벤치마크 MMBench

MMBench is a comprehensive evaluation framework designed to measure the capabilities of multimodal large language models across a wide array of visual and textual tasks.

AI 모델 벤치마크 HELM

A standardized, holistic evaluation framework from Stanford University designed to measure the performance and safety of large language models.

AI 모델 벤치마크 OpenCompass

OpenCompass is an open-source evaluation framework developed by the Shanghai AI Lab to provide standardized, comprehensive benchmarking for large language models.

AI 모델 벤치마크 FlagEval

An open-source evaluation framework developed by the Beijing Academy of Artificial Intelligence (BAAI) to standardize and scale LLM benchmarking.

AI 모델 벤치마크 LMArena

A crowdsourced benchmarking platform where users battle-test Large Language Models through blind side-by-side comparisons.

AI 모델 벤치마크 MMLU

MMLU is a comprehensive benchmark designed to evaluate the general knowledge and problem-solving capabilities of large language models across a vast array of disciplines.