Skip to content

ccdv/pubmed-summarization

SummarizationText GenerationEN

Ccdv/pubmed-summarization is a summarization-focused dataset in EN that provides 266,430 labeled examples distributed in Parquet format. And falls in the 100K<n<1M size category, and has been downloaded 4.6K times.

About ccdv/pubmed-summarization

PubMed dataset for summarization Dataset for summarization of long documents.Adapted from this repo.Note that original data are pre-tokenized so this dataset returns " ".join(text) and add "\n" for paragraphs. This dataset is compatible with th...

Details

Task
Summarization, Text Generation
Language
EN
Format
Parquet
Rows / instances
266430
Size
100K<n<1M
Creator
ccdv
Year
2022
Downloads
4639
Likes
90
Download Homepage

Related Summarization, Text Generation datasets

FAQ