Question 1

What is the princeton-nlp/SWE-bench dataset?

Accepted Answer

Dataset Summary

SWE-bench is a dataset that tests systems’ ability to solve GitHub issues automatically. The dataset collects 2,294 Issue-Pull Request pairs from 12 popular Python repositories. Evaluation is performed by unit test verification ...

Question 2

Is princeton-nlp/SWE-bench a benchmark?

Accepted Answer

Yes — princeton-nlp/SWE-bench is used as an LLM benchmark. See model leaderboards in the Benchmarks section.

Question 3

Where can I download princeton-nlp/SWE-bench?

Accepted Answer

princeton-nlp/SWE-bench is available at its source: https://huggingface.co/datasets/princeton-nlp/SWE-bench.

princeton-nlp/SWE-bench

About princeton-nlp/SWE-bench

Details

Related General NLP datasets

FAQ