Skip to content

AlgorithmicResearchGroup/s2orc_full

Text GenerationFeature ExtractionText ClassificationENBenchmarkodc-by

Created by AlgorithmicResearchGroup at 2024, the AlgorithmicResearchGroup/s2orc_full is a text generation benchmark dataset in EN containing 14,515,649 records in Parquet format. With 39.7K downloads and 0 likes, it is actively used by the community. It is released under the odc-by license and is a 10M<n<100M-scale dataset.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About AlgorithmicResearchGroup/s2orc_full

S2ORC Full — Semantic Scholar Open Research Corpus A complete redistribution of the S2ORC dataset in Parquet format on Hugging Face, containing 14.5 million academic papers with full text, structured metadata, and citation information. ...

Details

Task
Text Generation, Feature Extraction, Text Classification
Language
EN
Format
Parquet
Rows / instances
14515649
Size
10M<n<100M
Creator
AlgorithmicResearchGroup
Year
2024
License
odc-by
Downloads
39656
Likes
0
Download Homepage

Related Text Generation, Feature Extraction, Text Classification datasets

FAQ