Skip to content

jedibear/s2orc_full

Text GenerationFeature ExtractionText ClassificationENodc-by

Jedibear/s2orc_full is a text generation dataset in EN from jedibear with 14,515,649 records in Parquet format. It is distributed under the odc-by license and falls in the 10M<n<100M size category, and has been downloaded 23.7K times.

About jedibear/s2orc_full

S2ORC Full — Semantic Scholar Open Research Corpus A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information. ...

Details

Task
Text Generation, Feature Extraction, Text Classification
Language
EN
Format
Parquet
Rows / instances
14515649
Size
10M<n<100M
Creator
jedibear
Year
2026
License
odc-by
Downloads
23721
Likes
0
Download Homepage

Related Text Generation, Feature Extraction, Text Classification datasets

FAQ