François Chollet Retweeted Quoc Le
This is neat: automatically searching over the space of possible TensorFlow models and finding a Transformer architecture that achieves 3-4x efficient gains.https://twitter.com/quocleix/status/1440024037896245248 …
François Chollet added,
Quoc Le @quocleix
Primer: Searching for Efficient Transformers for Language Modeling
http://arxiv.org/abs/2109.08668
We use evolution to design a new Transformer variant, called Primer. Primer has a better scaling law, and is 3X to 4X faster for training than Transformer for language modeling. pic.twitter.com/daoHnuQEjS
Show this thread
11:16 PM - 20 Sep 2021
0 replies
24 retweets
181 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.