Tired of tuning your neural network optimizer? Wish there was an optimizer that just worked? We’re excited to release VeLO 🚲, the first hyperparameter-free learned optimizer that outperforms hand-designed optimizers on real-world problems: velo-code.github.io
🧵
Conversation
Replying to
VeLO is a versatile learned optimizer: it is a neural network that takes in gradients (and other features) and outputs weight updates. It was meta-learned by training on thousands of optimization tasks, spanning a huge range of models and datasets.
2
5
41
To test VeLO’s performance, we also release VeLOdrome: a new optimizer benchmark that consists of a broad set of held out neural network training problems. VeLO performs better than nearly all tuned baselines on all problems without any hyperparameters!
2
2
23
This work was led by and featured contributions from researchers across . If you want to learn more or try it out, check out our paper or code: velo-code.github.io
1
32
Does this also replace learning rate schedulers?
1
1
Yes! Update step sizes are automatically selected by VeLO, and vary across time/layers/architectures
1
10
Show replies
Great work! Though it seems to be very expensive, as it computes both backprop and an LSTM pass. What does the training curve look like if x axis is normalized by compute/wall clock time?
1
22
The overhead due to the optimizer is ~10x the overhead of Adam, which is usually small compared to the compute cost of computing the gradients for the model you are applying VeLO to train. See leftmost pane in this plot:
1
12
Show replies
Whoa! I like that premise. Excited about reading and taking this for a spin!
1
3
Show replies
I'm excited to try this out. I've actually thought a bit about building an RL optimizer (vs LSTM here).
3






