Skip to content

DAMO-NLP-SG/multimodal_textbook

Text GenerationSummarizationENapache-2.0

Created by DAMO-NLP-SG at 2025, the DAMO-NLP-SG/multimodal_textbook is a text generation dataset in EN in Parquet format. With 980 downloads and 164 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 1M<n<10M-scale dataset.

About DAMO-NLP-SG/multimodal_textbook

Multimodal-Textbook-6.5M Overview This dataset is for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining", containing 6.5M images interleaving with 0.8B text from instructional videos. It contai...

Details

Task
Text Generation, Summarization
Language
EN
Format
Parquet
Rows / instances
N/A
Size
1M<n<10M
Creator
DAMO-NLP-SG
Year
2025
License
apache-2.0
Downloads
980
Likes
164
Download Homepage

Related Text Generation, Summarization datasets

FAQ