Skip to content

allenai/dolma3_mix-6T-1025-7B

Text GenerationENodc-by

Allenai/dolma3_mix-6T-1025-7B is a text generation dataset in EN from allenai in Parquet format. It is distributed under the odc-by license, and has been downloaded 124.7K times.

About allenai/dolma3_mix-6T-1025-7B

⚠️ WARNING: This dataset is intended ONLY for reproducing Olmo 3 7B ⚠️ For all other training use cases, including training from scratch, please utilize our primary dolma 3 data mix: https://huggingface.co/datasets/allenai/dolma3_mix-6T. Note: ...

Details

Task
Text Generation
Language
EN
Format
Parquet
Rows / instances
N/A
Creator
allenai
Year
2025
License
odc-by
Downloads
124738
Likes
53
Download Homepage

Related Text Generation datasets

FAQ