Conversation

all the flashy deepmind papers are like "we used 12 intermediate losses and a 5 stage architecture" meanwhile openai papers are like "we made the model bigger"
8
182