ai4bharat/IndicVoices
General NLPEnglishcc-by-4.0
Created by ai4bharat at 2025, the ai4bharat/IndicVoices is a General NLP dataset in English containing 6,379,012 records in Parquet format. With 11.3K downloads and 70 likes, it is actively used by the community. It is released under the cc-by-4.0 license and is a 1M<n<10M-scale dataset.
About ai4bharat/IndicVoices
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Updates
[23 December 2025] We now have 11,200 hours of transcribed data! 🎉
Overview
INDICVOICES is a dataset...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- 6379012
- Size
- 1M<n<10M
- Creator
- ai4bharat
- Year
- 2025
- License
- cc-by-4.0
- Downloads
- 11316
- Likes
- 70