Skip to content

CodeSearchNet Corpus

Text CorporaEnglishBenchmark

The CodeSearchNet Corpus dataset is a English text corpora resource from Husain et al. at 2019 comprising 6 examples.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About CodeSearchNet Corpus

Dataset contains functions with associated documentation written in Go, Java, JavaScript, PHP, Python, and Ruby from open source projects on GitHub.

Details

Task
Text Corpora
Language
English
Format
JSON
Rows / instances
6M
Creator
Husain et al.
Year
2019
Download Paper

Related Text Corpora datasets

FAQ