Skip to content

Helsinki-NLP/tatoeba

TranslationAB, ACM, ADYcc-by-2.0

Created by Helsinki-NLP at 2022, the Helsinki-NLP/tatoeba is a translation dataset in AB, ACM, ADY containing 413,190 records in Parquet format. With 2.2K downloads and 56 likes, it is actively used by the community. It is released under the cc-by-2.0 license and is a 10K<n<100K-scale dataset.

About Helsinki-NLP/tatoeba

This is a collection of translated sentences from Tatoeba 359 languages, 3,403 bitexts total number of files: 750 total number of tokens: 65.54M total number of sentence fragments: 8.96M

Details

Task
Translation
Language
AB, ACM, ADY
Format
Parquet
Rows / instances
413190
Size
10K<n<100K
Creator
Helsinki-NLP
Year
2022
License
cc-by-2.0
Downloads
2165
Likes
56
Download Homepage

Related Translation datasets

FAQ