Skip to content

JeanKaddour/minipile

Text GenerationFill MaskEN

JeanKaddour/minipile is a text generation dataset in EN from JeanKaddour with 1,010,500 records in Parquet format. It is distributed under the other license and falls in the 1M<n<10M size category, and has been downloaded 4.3K times.

About JeanKaddour/minipile

Dataset Card for MiniPile Dataset Description The MiniPile Challenge for Data-Efficient Language Models Dataset Summary MiniPile is a 6GB subset of the deduplicated The Pile corpus. To curate MiniPile, we perform a simple,...

Details

Task
Text Generation, Fill Mask
Language
EN
Format
Parquet
Rows / instances
1010500
Size
1M<n<10M
Creator
JeanKaddour
Year
2023
License
other
Downloads
4315
Likes
149
Download Homepage

Related Text Generation, Fill Mask datasets

FAQ