defunct-datasets/the_pile_books3
Text GenerationFill MaskENmit
Defunct-datasets/the_pile_books3 is a text generation-focused dataset in EN that provides 196,639 labeled examples distributed in Parquet format. It is distributed under the mit license and falls in the 100K<n<1M size category, and has been downloaded 245 times.
About defunct-datasets/the_pile_books3
This dataset is Shawn Presser's work and is part of EleutherAi/The Pile dataset. This dataset contains all of bibliotik in plain .txt form, aka 197,000 books processed in exactly the same way as did for bookcorpusopen (a.k.a. books1). seems to be ...
Details
- Task
- Text Generation, Fill Mask
- Language
- EN
- Format
- Parquet
- Rows / instances
- 196639
- Size
- 100K<n<1M
- Creator
- defunct-datasets
- Year
- 2022
- License
- mit
- Downloads
- 245
- Likes
- 152