Machine Translation Datasets
There are 25 machine translation datasets in our directory, 1 of which are benchmarks. Each links to its source, paper, and download — browse the full list below or filter by language.
Machine Translation is the task of automatically converting text between languages while preserving meaning. We catalog 25 datasets for it.
Updated June 2026
- JW300Machine TranslationMulti-Lingual
- TanzilMachine TranslationMulti-Lingual
- Igbo TextText Corpora, Machine TranslationIgbo, English
- Urhobo TextText Corpora, Machine TranslationUrhobo, English
- ParaCrawl CorpusMachine TranslationMulti-Lingual
- DiaBLaMachine Translation, DialogueFrench, English
- LibriVoxDeEnSpeech Translation, Machine TranslationGerman, English
- Bible CorpusMachine TranslationMulti-Lingual
- BianetMachine TranslationMulti-Lingual
- CAPESMachine TranslationPortuguese, English
- DOGCText Corpora, Machine TranslationCatalan, Spanish
- ECB CorpusText Corpora, Machine TranslationMulti-Lingual
- EMEAMachine TranslationMulti-Lingual
- EubookshopText Corpora, Machine TranslationMulti-Lingual
- WMT 14 English-GermanMachine TranslationMulti-Lingual
- WMT 15 English-CzechMachine TranslationMulti-Lingual
- WMT 19 Multiple DatasetsText Corpora, Machine TranslationMulti-Lingual
- FiskmöMachine TranslationFinnish, Swedish
- Books CorpusMachine TranslationMulti-Lingual
- Web Inventory of Transcribed and Translated Talks (WIT3)Machine TranslationMulti-Lingual
- IIT Bombay English-Hindi CorpusMachine TranslationHindi, English
- European Parliament Proceedings (Europarl)Text Corpora, Machine TranslationMulti-Lingual
- Microsoft Speech Language Translation Corpus (MSLT)Speech Recognition, Machine TranslationMulti-Lingual
- Worldwide News - Aggregate of 20K FeedsClustering, Events, Machine TranslationMulti-Lingual
- ParCorFullMachine Translation, Coreference ResolutionGerman, EnglishBenchmark