Skip to content

Self-Annotated Reddit Corpus (SARC)

Text CorporaSarcasm DetectionEnglishBenchmark

Self-Annotated Reddit Corpus (SARC) is a text corpora benchmark dataset in English from Khodak et al. with 1.3 records in CSV format.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About Self-Annotated Reddit Corpus (SARC)

Dataset contains 1.3 million sarcastic comments from the Internet commentary website Reddit. It contains statements, along with their responses as well as many non-sarcastic comments from the same source.

Details

Task
Text Corpora, Sarcasm Detection
Language
English
Format
CSV
Rows / instances
1.3M
Creator
Khodak et al.
Year
2017
Download Paper

FAQ