ccdv/arxiv-summarization
SummarizationText GenerationEN
Created by ccdv at 2022, the ccdv/arxiv-summarization is a summarization dataset in EN containing 431,826 records in Parquet format. With 8.6K downloads and 124 likes, it is actively used by the community and is a 100K<n<1M-scale dataset.
About ccdv/arxiv-summarization
Arxiv dataset for summarization
Dataset for summarization of long documents.Adapted from this repo.Note that original data are pre-tokenized so this dataset returns " ".join(text) and add "\n" for paragraphs. This dataset is compatible with the...
Details
- Task
- Summarization, Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- 431826
- Size
- 100K<n<1M
- Creator
- ccdv
- Year
- 2022
- Downloads
- 8586
- Likes
- 124