VIdeO-and-Language INference (VIOLIN)
Multi-Modal LearningEnglish
The VIdeO-and-Language INference (VIOLIN) dataset is a English multi-modal learning resource from Liu et al. at 2020 comprising 15,887 examples.
About VIdeO-and-Language INference (VIOLIN)
Dataset contains 95,322 video-hypothesis pairs from 15,887 video clips, spanning over 582 hours of video (YouTube and TV shows). Inference descriptions of video content were annotated. Inferences are used to measure entailment vs video clip.
Details
- Task
- Multi-Modal Learning
- Language
- English
- Format
- JSON, H5
- Rows / instances
- 15,887
- Creator
- Liu et al.
- Year
- 2020