Question 1

What is the lmsys/mt_bench_human_judgments dataset?

Accepted Answer

Content

This dataset contains 3.3K expert-level pairwise human preferences for model responses generated by 6 models in response to 80 MT-bench questions.
The 6 models are GPT-4, GPT-3.5, Claud-v1, Vicuna-13B, Alpaca-13B, and LLaMA-13B. The ann...

Question 2

Is lmsys/mt_bench_human_judgments a benchmark?

Accepted Answer

lmsys/mt_bench_human_judgments is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download lmsys/mt_bench_human_judgments?

Accepted Answer

lmsys/mt_bench_human_judgments is available at its source: https://huggingface.co/datasets/lmsys/mt_bench_human_judgments.

lmsys/mt_bench_human_judgments

About lmsys/mt_bench_human_judgments

Details

Related Question Answering datasets

FAQ