JeanKaddour/minipile
Text GenerationFill MaskEN
JeanKaddour/minipile is a text generation dataset in EN from JeanKaddour with 1,010,500 records in Parquet format. It is distributed under the other license and falls in the 1M<n<10M size category, and has been downloaded 4.3K times.
About JeanKaddour/minipile
Dataset Card for MiniPile
Dataset Description
The MiniPile Challenge for Data-Efficient Language Models
Dataset Summary
MiniPile is a 6GB subset of the deduplicated The Pile corpus. To curate MiniPile, we perform a simple,...
Details
- Task
- Text Generation, Fill Mask
- Language
- EN
- Format
- Parquet
- Rows / instances
- 1010500
- Size
- 1M<n<10M
- Creator
- JeanKaddour
- Year
- 2023
- License
- other
- Downloads
- 4315
- Likes
- 149