Question 1

What is the CShorten/ML-ArXiv-Papers dataset?

Accepted Answer

This dataset contains the subset of ArXiv papers with the "cs.LG" tag to indicate the paper is about Machine Learning.
The core dataset is filtered from the full ArXiv dataset hosted on Kaggle: https://www.kaggle.com/datasets/Cornell-University/ar...

Question 2

Is CShorten/ML-ArXiv-Papers a benchmark?

Accepted Answer

CShorten/ML-ArXiv-Papers is a dataset for training or evaluation; it isn't tracked as a standard LLM benchmark in our catalog.

Question 3

Where can I download CShorten/ML-ArXiv-Papers?

Accepted Answer

CShorten/ML-ArXiv-Papers is available at its source: https://huggingface.co/datasets/CShorten/ML-ArXiv-Papers.

Question 4

What license is CShorten/ML-ArXiv-Papers released under?

Accepted Answer

CShorten/ML-ArXiv-Papers is distributed under the afl-3.0 license.

CShorten/ML-ArXiv-Papers

About CShorten/ML-ArXiv-Papers

Details

Related General NLP datasets

FAQ