Skip to content

The New York Times Annotated Corpus

SummarizationInformation ExtractionEnglish

Created by Sandhaus et al. at 2008, the The New York Times Annotated Corpus is a summarization dataset in English containing 1.8 records in XML format.

About The New York Times Annotated Corpus

Dataset contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom.

Details

Task
Summarization, Information Extraction
Language
English
Format
XML
Rows / instances
1.8M
Creator
Sandhaus et al.
Year
2008
Download

FAQ