the party dont stop
Quote Tweet
Training Compute-Optimal Large Language Models
Trains Chinchilla, which is Gopher w/ the same compute budget but with 70B parameters and 4x more more data. It significantly outperforms Gopher, e.g. by >7% on MMLU.
arxiv.org/abs/2203.15556
2
18


