LLaVA-OV-72B
ByteDanceNanyang Technological UniversityChinese University of Hong Kong (CUHK)Hong Kong University of Science and Technology (HKUST)Image captioningVisual question answeringVideo descriptionObject recognitionAction recognitionLanguage modeling/generationOpen weights
LLaVA-OV-72B is image captioning model published by ByteDance,Nanyang Technological University,Chinese University of Hong Kong (CUHK),Hong Kong University of Science and Technology (HKUST) in 2024 featuring 72000000000.0 parameters.
About LLaVA-OV-72B
We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision
Details
- Provider
- ByteDance,Nanyang Technological University,Chinese University of Hong Kong (CUHK),Hong Kong University of Science and Technology (HKUST)
- Task
- Image captioning,Visual question answering,Video description,Object recognition,Action recognition,Language modeling/generation
- Parameters
- 72000000000.0
- Released
- 2024-08-06
- Open weights
- Yes