1/ Can we build video prediction models by masked visual pretraining via Transformer?
We present MaskViT: a simple & parameter efficient method to generate high res. videos in real time.
Paper: arxiv.org/abs/2206.11894
Web: maskedvit.github.io🧵👇
Pinned Tweet
GIF
8
39
179
Show this thread






























