ajibawa-2023/Java-Code-Large
Text GenerationENmit
Ajibawa-2023/Java-Code-Large is a text generation dataset in EN from ajibawa-2023 in Parquet format. It is distributed under the mit license and falls in the 10M<n<100M size category, and has been downloaded 1.2K times.
About ajibawa-2023/Java-Code-Large
Java-Code-Large
Java-Code-Large is a large-scale corpus of publicly available Java source code comprising more than 15 million java codes. The dataset is designed to support research in large language model (LLM) pretraining, code intelligence, so...
Details
- Task
- Text Generation
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10M<n<100M
- Creator
- ajibawa-2023
- Year
- 2026
- License
- mit
- Downloads
- 1213
- Likes
- 32