Learning Video Representations from Large Language Models
Introduces LaViLa, a new approach to learning video-language rep by leveraging LLMs. Achieves SotA on a wide range of video tasks.
proj: facebookresearch.github.io/LaViLa/
abs: arxiv.org/abs/2212.04501
code: github.com/facebookresear
Conversation
Here you go 👇
LaViLa is an approach to learning video-language representations using pre-trained LLMs that outperforms the state-of-the-art on multiple first- and third-person video tasks.


