Question 1

What is the Self-Annotated Reddit Corpus (SARC) dataset?

Accepted Answer

Dataset contains 1.3 million sarcastic comments from the Internet commentary website Reddit. It contains statements, along with their responses as well as many non-sarcastic comments from the same source.

Question 2

Is Self-Annotated Reddit Corpus (SARC) a benchmark?

Accepted Answer

Yes — Self-Annotated Reddit Corpus (SARC) is used as an LLM benchmark. See model leaderboards in the Benchmarks section.

Question 3

Where can I download Self-Annotated Reddit Corpus (SARC)?

Accepted Answer

Self-Annotated Reddit Corpus (SARC) is available at its source: https://nlp.cs.princeton.edu/SARC/.

Self-Annotated Reddit Corpus (SARC)

About Self-Annotated Reddit Corpus (SARC)

Details

FAQ