Skip to content

adams-story/datacomp200m

General NLPEnglish

The adams-story/datacomp200m dataset is a English General NLP resource from adams-story at 2026. With 22.3K downloads and 2 likes, it is actively used by the community and is a 100M<n<1B-scale dataset.

About adams-story/datacomp200m

Datacomp200m This is a smaller version of the datacomp_1b dataset. Filtering was done by taking all rows that had self similarity (inner product) above 0.32. This resulted in 213009083 (213 million) rows. The results of the datacomp p...

Details

Task
General NLP
Language
English
Format
Parquet
Rows / instances
N/A
Size
100M<n<1B
Creator
adams-story
Year
2026
Downloads
22314
Likes
2
Download Homepage

Related General NLP datasets

FAQ