Skip to content

Books Corpus

Machine TranslationMulti-Lingual

Created by Tiedemann at 2012, the Books Corpus is a machine translation dataset in Multi-Lingual containing 0.91 records in XCES, XML format.

About Books Corpus

Dataset contains a collection of copyright free books. Corpus consists of 16 languages and 0.91M sentence fragments and 19.50M tokens.

Details

Task
Machine Translation
Language
Multi-Lingual
Format
XCES, XML
Rows / instances
0.91M
Creator
Tiedemann
Year
2012
Download Paper

Related Machine Translation datasets

FAQ