alvanlii/cantonese-youtube
Automatic Speech RecognitionAudio ClassificationZH, YUE
The alvanlii/cantonese-youtube dataset is a ZH, YUE automatic speech recognition resource from alvanlii at 2024 comprising 1,478,373 examples. With 551 downloads and 51 likes, it is actively used by the community and is a 1M<n<10M-scale dataset.
About alvanlii/cantonese-youtube
Cantonese Youtube Pseudo-Transcription Dataset
Contains approximately 10k hours of audio sourced from YouTube
Videos are chosen at random, and scraped on a channel basis
Includes news, vlogs, entertainment, stories, health
Columns
transcript...
Details
- Task
- Automatic Speech Recognition, Audio Classification
- Language
- ZH, YUE
- Format
- Parquet
- Rows / instances
- 1478373
- Size
- 1M<n<10M
- Creator
- alvanlii
- Year
- 2024
- Downloads
- 551
- Likes
- 51