allenai/dolma3_mix-6T-1025-7B
Text GenerationENodc-by
Allenai/dolma3_mix-6T-1025-7B is a text generation dataset in EN from allenai in Parquet format. It is distributed under the odc-by license, and has been downloaded 124.7K times.
About allenai/dolma3_mix-6T-1025-7B
⚠️ WARNING: This dataset is intended ONLY for reproducing Olmo 3 7B ⚠️
For all other training use cases, including training from scratch, please utilize our primary dolma 3 data mix: https://huggingface.co/datasets/allenai/dolma3_mix-6T.
Note: ...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- allenai
- Year
- 2025
- License
- odc-by
- Downloads
- 124738
- Likes
- 53