ai4bharat/samanantar
Text GenerationTranslationEN, AS, BNcc-by-nc-4.0
Created by ai4bharat at 2022, the ai4bharat/samanantar is a text generation dataset in EN, AS, BN containing 49,774,246 records in Parquet format. With 2.6K downloads and 41 likes, it is actively used by the community. It is released under the cc-by-nc-4.0 license and is a 10M<n<100M-scale dataset.
About ai4bharat/samanantar
Dataset Card for Samanantar
Dataset Summary
Samanantar is the largest publicly available parallel corpora collection for Indic language: Assamese, Bengali,
Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, Telugu.
...
Details
- Task
- Text Generation, Translation
- Language
- EN, AS, BN
- Format
- Parquet
- Rows / instances
- 49774246
- Size
- 10M<n<100M
- Creator
- ai4bharat
- Year
- 2022
- License
- cc-by-nc-4.0
- Downloads
- 2588
- Likes
- 41