abuelkhair-corpus/arabic_billion_words
Text GenerationFill MaskAR
Abuelkhair-corpus/arabic_billion_words is a text generation-focused dataset in AR that provides 5,370,082 labeled examples distributed in Parquet format. It is distributed under the unknown license and falls in the 100K<n<1M size category, and has been downloaded 220 times.
About abuelkhair-corpus/arabic_billion_words
Abu El-Khair Corpus is an Arabic text corpus, that includes more than five million newspaper articles.
It contains over a billion and a half words in total, out of which, there are about three million unique words.
The corpus is encoded with two t...
Details
- Task
- Text Generation, Fill Mask
- Language
- AR
- Format
- Parquet
- Rows / instances
- 5370082
- Size
- 100K<n<1M
- Creator
- abuelkhair-corpus
- Year
- 2022
- License
- unknown
- Downloads
- 220
- Likes
- 34