stanford-oval/ccnews
Text ClassificationQuestion AnsweringText GenerationMULTILINGUAL, AF, AM
Stanford-oval/ccnews is a text classification dataset in MULTILINGUAL, AF, AM from stanford-oval in Parquet format.
About stanford-oval/ccnews
This dataset is the result of processing all WARC files in the CCNews Corpus, from the beginning (2016) to June of 2024.
The data has been cleaned and deduplicated, and language of articles have been detected and added. The process is similar to w...
Details
- Task
- Text Classification, Question Answering, Text Generation
- Language
- MULTILINGUAL, AF, AM
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- stanford-oval
- Year
- 2024