Skip to content

Open Research Corpus

Text CorporaEnglishBenchmark

The Open Research Corpus dataset is a English text corpora resource from Ammar et al. at 2018 comprising 39 examples.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About Open Research Corpus

Dataset contains over 39 million published research papers in Computer Science, Neuroscience, and Biomedical.

Details

Task
Text Corpora
Language
English
Format
JSON
Rows / instances
39M
Creator
Ammar et al.
Year
2018
Download Paper

Related Text Corpora datasets

FAQ