Skip to content

arcinstitute/opengenome2

Text GenerationEnglishBenchmarkapache-2.0

Created by arcinstitute at 2025, the arcinstitute/opengenome2 is a text generation benchmark dataset in English in Parquet format. With 6.5K downloads and 147 likes, it is actively used by the community. It is released under the apache-2.0 license and is a n>1T-scale dataset.

📊 This dataset is used as an LLM benchmark. See model leaderboards →

About arcinstitute/opengenome2

OpenGenome2 OpenGenome2 is a database of nearly 9 trillion base pairs of curated DNA from across all domains of life. Collected from diverse species and public data sources, OpenGenome2 was used to train Evo 2 models. Please refer to the Ev...

Details

Task
Text Generation
Language
English
Format
Parquet
Rows / instances
N/A
Size
n>1T
Creator
arcinstitute
Year
2025
License
apache-2.0
Downloads
6538
Likes
147
Download Homepage

Related Text Generation datasets

FAQ