allenai/peS2o
Text GenerationFill MaskENodc-by
Allenai/peS2o is a text generation dataset in EN from allenai in Parquet format. It is distributed under the odc-by license and falls in the 10B<n<100B size category, and has been downloaded 11.8K times.
About allenai/peS2o
Pretraining Effectively on S2ORC!
The peS2o dataset is a collection of ~40M creative open-access academic papers,
cleaned, filtered, and formatted for pre-training of language models. It is derived from
the Semantic Scholar Open Research Corpus(L...
Details
- Task
- Text Generation, Fill Mask
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10B<n<100B
- Creator
- allenai
- Year
- 2023
- License
- odc-by
- Downloads
- 11767
- Likes
- 197