DAMO-NLP-SG/multimodal_textbook
Text GenerationSummarizationENapache-2.0
Created by DAMO-NLP-SG at 2025, the DAMO-NLP-SG/multimodal_textbook is a text generation dataset in EN in Parquet format. With 980 downloads and 164 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 1M<n<10M-scale dataset.
About DAMO-NLP-SG/multimodal_textbook
Multimodal-Textbook-6.5M
Overview
This dataset is for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining", containing 6.5M images interleaving with 0.8B text from instructional videos.
It contai...
Details
- Task
- Text Generation, Summarization
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 1M<n<10M
- Creator
- DAMO-NLP-SG
- Year
- 2025
- License
- apache-2.0
- Downloads
- 980
- Likes
- 164