Microsoft Machine Reading COmprehension Dataset (MS MARCO)
Question AnsweringReading ComprehensionEnglishBenchmark
Created by Bajaj et al. at 2016, the Microsoft Machine Reading COmprehension Dataset (MS MARCO) is a question answering benchmark dataset in English containing 1,010,916 records in JSON format.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About Microsoft Machine Reading COmprehension Dataset (MS MARCO)
Dataset focused on machine reading comprehension, question answering, and passage ranking, keyphrase extraction, and conversational search studies.
Details
- Task
- Question Answering, Reading Comprehension
- Language
- English
- Format
- JSON
- Rows / instances
- 1,010,916
- Creator
- Bajaj et al.
- Year
- 2016