GPT-3: Language Models are Few-Shot Learners, by et al.
“We train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting.”
arxiv.org/abs/2005.14165
Conversation
"Unfortunately, a bug in the filtering caused us to ignore some overlaps, and due to the cost of training it was not feasible to retrain the model."
It is interesting the new dilemmas faced in the "train it once" regime
1
4
52
you'd need to buy another hard drive just to store the parameters of the full model. It's +600 GB, right?
5
I did quick back of the envelope calculation and concluded that it would cost around $2M to retrain their largest model using AWS p2.16xlarge instances. Nice.
2
21
4
Super impressive, but I'm not very happy that the trend towards massive models continue.
1
4
Hey , I'm extremely disappointed that you guys didn't decide to include the Once and Future String pun in this paper. Would have been a double pun! (a: LMs take in once and predict future string, b: the return of GPT as King of expensive LMs)
Quote Tweet
Attention deep NLP people: I've got a great title for a language modeling paper -- all I'm missing is the method. It's called
"The Once and Future String."
Hit me up if you've got any good models and I'm happy to split the authorship 50-50 on the NeurIPS submission.
1
7








