Skip to content

VIdeO-and-Language INference (VIOLIN)

Multi-Modal LearningEnglish

The VIdeO-and-Language INference (VIOLIN) dataset is a English multi-modal learning resource from Liu et al. at 2020 comprising 15,887 examples.

About VIdeO-and-Language INference (VIOLIN)

Dataset contains 95,322 video-hypothesis pairs from 15,887 video clips, spanning over 582 hours of video (YouTube and TV shows). Inference descriptions of video content were annotated. Inferences are used to measure entailment vs video clip.

Details

Task
Multi-Modal Learning
Language
English
Format
JSON, H5
Rows / instances
15,887
Creator
Liu et al.
Year
2020
Download Paper

Related Multi-Modal Learning datasets

FAQ