Skip to content

allenai/dolma3_mix-6T

Text GenerationENodc-by

The allenai/dolma3_mix-6T dataset is a EN text generation resource from allenai at 2025. With 71.5K downloads and 33 likes, it is actively used by the community. It is released under the odc-by license.

About allenai/dolma3_mix-6T

Dolma 3 Mix (6T) The Dolma 3 Mix (6T) is the collection of data used during the pretraining stage to train the Olmo-3-1125-32B model. This dataset is made up of ~6 trillion tokens from a diverse mix of web content, academic publications, code, ...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
allenai
Year
2025
License
odc-by
Downloads
71487
Likes
33
Download Homepage

Related Text Generation datasets

FAQ