Conversation

the party dont stop
Quote Tweet
Training Compute-Optimal Large Language Models Trains Chinchilla, which is Gopher w/ the same compute budget but with 70B parameters and 4x more more data. It significantly outperforms Gopher, e.g. by >7% on MMLU. arxiv.org/abs/2203.15556
Image
2
18