Log in
Sign up
See new Tweets

Conversation

Shivon Zilis
@shivon
·
Dec 13, 2020
OH: Why are language models so much bigger than computer vision models? ... Because a picture is worth a thousand words 🥁🤦🏻‍♀️
28
70
830
Lucas Beyer
@giffmana
Replying to
@shivon
Well actually... we found that a picture is only worth 16x16 words ;-)
arXiv logo
arxiv.org
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either...
1:01 PM · Dec 13, 2020·Twitter Web App
13
Likes