The New York Times Annotated Corpus
SummarizationInformation ExtractionEnglish
Created by Sandhaus et al. at 2008, the The New York Times Annotated Corpus is a summarization dataset in English containing 1.8 records in XML format.
About The New York Times Annotated Corpus
Dataset contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007 with article metadata provided by the New York Times Newsroom.
Details
- Task
- Summarization, Information Extraction
- Language
- English
- Format
- XML
- Rows / instances
- 1.8M
- Creator
- Sandhaus et al.
- Year
- 2008