Spanish Datasets
We catalog 7 Spanish datasets for NLP and machine learning. Browse the list below or narrow down by task.
This page covers Spanish, one of the most spoken languages worldwide and a high-resource language for NLP. Our directory includes 7 datasets in Spanish.
Updated June 2026
- Mercadolibre Data Challenge 2019Text ClassificationPortuguese, Spanish
- CC100-SpanishText CorporaSpanish
- DOGCText Corpora, Machine TranslationCatalan, Spanish
- Conference on Computational Natural Language Learning (CoNLL 2002)Named Entity Recognition (NER)Spanish, Dutch
- OpenAssistant/oasst1General NLPEN, ES, RU
- ShadenA/MathNetQuestion Answering, Text Generation, Image To TextEN, PT, ES
- LEMAS-Project/LEMAS-Dataset-trainText To Speech, Automatic Speech RecognitionIT, PT, ES