LMArena

92 浏览量

概述

LMArena（又名 Chatbot Arena）是一个领先的开源基准测试平台，旨在通过人类偏好来评估大型语言模型 (LLM)。与可能受训练数据影响的静态基准测试不同，LMArena 采用众包式盲测 A/B 测试方法，以确定哪些 AI 模型能够在真实场景中提供最有帮助、最准确的响应。

Blind Battle Mode: Users enter a prompt, and two anonymous models generate responses. The user votes for the better output without knowing which model produced which answer.
Elo Rating System: 该平台基于数千场众包对战，为每个模型计算 Elo 分数，从而创建一个动态且值得信赖的排行榜。
多样化的模型支持： The platform tracks a wide array of proprietary models (like GPT-4 and Claude) and open-source alternatives (like Llama and Mistral).
Category-Specific Rankings: Users can filter performance by coding, hard prompts, or general conversation to see which model excels in specific domains.

AI Researchers: To track the state-of-the-art performance of LLMs.
Developers: To decide which API or open-source model to integrate into their applications based on human-centric quality.
AI Enthusiasts: To experiment with multiple top-tier models in one interface for free.

LMArena 主要是一个研究工具，可以免费使用。但是，用户需要注意的是，结果基于人的偏好，可能存在主观性。此外，由于它是一个社区驱动的平台，响应时间可能会因服务器负载而异。

Disclaimer: Features and available models may change frequently. Please verify the current leaderboard and terms on the official website.

信息可能不完整或已过时；请在官方网站上确认详细信息。

结尾

2023年10月29日

0

复制说明：内容可能来源于第三方，并经人工智能辅助处理。我们不保证其准确性。所有商标均归其各自所有者所有。

MMLU

旗帜评估

评论（暂无评论）