Skip to content

Cornell Newsroom

Text CorporaSummarizationEnglish

Cornell Newsroom is a text corpora-focused dataset in English that provides 1.3 labeled examples distributed in JSON format.

About Cornell Newsroom

Dataset contains 1.3 million articles and summaries written by authors and editors in the newsrooms of 38 major publications. The summaries are obtained from search and social metadata between 1998 and 2017.

Details

Task
Text Corpora, Summarization
Language
English
Format
JSON
Rows / instances
1.3M
Creator
Grusky et al.
Year
2018
Download Paper

Related Text Corpora, Summarization datasets

FAQ