ScaleAI/SWE-bench_Pro
General NLPEnglishBenchmark
Created by ScaleAI at 2025, the ScaleAI/SWE-bench_Pro is a General NLP benchmark dataset in English in Parquet format.
📊 This dataset is used as an LLM benchmark. See model leaderboards →
About ScaleAI/SWE-bench_Pro
Dataset Summary
SWE-Bench Pro is a challenging, enterprise-level dataset for testing agent ability on long-horizon software engineering tasks.
Paper: https://static.scale.com/uploads/654197dc94d34f66c0f5184e/SWEAP_Eval_Scale%20(9).pdf
See the r...
Details
- Task
- General NLP
- Language
- English
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- ScaleAI
- Year
- 2025