Skip to content

ccdv/arxiv-summarization

SummarizationText GenerationEN

Created by ccdv at 2022, the ccdv/arxiv-summarization is a summarization dataset in EN containing 431,826 records in Parquet format. With 8.6K downloads and 124 likes, it is actively used by the community and is a 100K<n<1M-scale dataset.

About ccdv/arxiv-summarization

Arxiv dataset for summarization Dataset for summarization of long documents.Adapted from this repo.Note that original data are pre-tokenized so this dataset returns " ".join(text) and add "\n" for paragraphs. This dataset is compatible with the...

Details

Task
Summarization, Text Generation
Language
EN
Format
Parquet
Rows / instances
431826
Size
100K<n<1M
Creator
ccdv
Year
2022
Downloads
8586
Likes
124
Download Homepage

Related Summarization, Text Generation datasets

FAQ