Korean Datasets
We catalog 10 Korean datasets for NLP and machine learning. Browse the list below or narrow down by task.
This page covers Korean, an East Asian language with a growing body of NLP datasets and models. Our directory includes 10 datasets in Korean.
Updated June 2026
- CC100-KoreanText CorporaKorean
- MarkrAI/KoCommercial-DatasetGeneral NLPKO
- Korean Single Speaker Dataset (KSS)Text-to-SpeechKorean
- daekeun-ml/naver-news-summarization-koSummarizationKO
- KorQuADQuestion Answering, Reading ComprehensionKorean
- data-is-better-together/fineweb-cText ClassificationLVS, KOR, KIN
- nvidia/Nemotron-Personas-KoreaText GenerationKO
- nlpai-lab/kullm-v2Text GenerationKO
- Jackrong/Claude-opus-4.6-TraceInversion-9000xText GenerationEN, ZH, KO
- Jackrong/Claude-opus-4.7-TraceInversion-5000xText GenerationEN, ZH, KO