nvidia/Nemotron-Pretraining-Code-v1
Text GenerationEnglish
Nvidia/Nemotron-Pretraining-Code-v1 is a text generation-focused dataset in English distributed in Parquet format.
About nvidia/Nemotron-Pretraining-Code-v1
Nemotron-Pre-Training-Dataset-v1 Release
Data Overview
This pretraining dataset, for generative AI model training, preserves high-value math and code while enriching it with diverse multilingual Q&A, fueling the next generation of in...
Details
- Task
- Text Generation
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- nvidia
- Year
- 2025