princeton-nlp/SWE-bench
General NLPEnglishBenchmark
Princeton-nlp/SWE-bench is a General NLP-focused benchmark dataset in English distributed in Parquet format.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About princeton-nlp/SWE-bench
Dataset Summary
SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification ...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- princeton-nlp
- Year
- 2023