Skip to content

HuggingFaceTB/cosmopedia

General NLPENapache-2.0

The HuggingFaceTB/cosmopedia dataset is a EN General NLP resource from HuggingFaceTB at 2024 comprising 31,064,744 examples. With 19K downloads and 721 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 10M<n<100M-scale dataset.

About HuggingFaceTB/cosmopedia

Cosmopedia v0.1 Image generated by DALL-E, the prompt was generated by Mixtral-8x7B-Instruct-v0.1 Note: Cosmopedia v0.2 is available at smollm-corpus User: What do you think "Cosmopedia" could mean? Hint: in our case it's not relate...

Details

Task
General NLP
Language
EN
Format
Parquet
Rows / instances
31064744
Size
10M<n<100M
Creator
HuggingFaceTB
Year
2024
License
apache-2.0
Downloads
19030
Likes
721
Download Homepage

Related General NLP datasets

FAQ