Skip to content

Microsoft Machine Reading COmprehension Dataset (MS MARCO)

Question AnsweringReading ComprehensionEnglishBenchmark

Created by Bajaj et al. at 2016, the Microsoft Machine Reading COmprehension Dataset (MS MARCO) is a question answering benchmark dataset in English containing 1,010,916 records in JSON format.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About Microsoft Machine Reading COmprehension Dataset (MS MARCO)

Dataset focused on machine reading comprehension, question answering, and passage ranking, keyphrase extraction, and conversational search studies.

Details

Task
Question Answering, Reading Comprehension
Language
English
Format
JSON
Rows / instances
1,010,916
Creator
Bajaj et al.
Year
2016
Download Paper

Related Question Answering, Reading Comprehension datasets

FAQ