Skip to content

commoncrawl/host-index-testing-v2

Text GenerationEnglish

Commoncrawl/host-index-testing-v2 is a text generation-focused dataset in English distributed in Parquet format. And falls in the 10B<n<100B size category, and has been downloaded 11.8K times.

About commoncrawl/host-index-testing-v2

Common Crawl Host Index v2 GitHub: https://github.com/commoncrawl/cc-host-index Each crawl, we generate a Host Index, which aggregates information about each web hosted visited during the crawl. The information is aggregated from the Common Cr...

Details

Task
Text Generation
Language
English
Format
Parquet
Rows / instances
N/A
Size
10B<n<100B
Creator
commoncrawl
Year
2025
Downloads
11796
Likes
0
Download Homepage

Related Text Generation datasets

FAQ