I am -loving- this combined Rectifed Adam (RAdam) + LookAhead optimizer. It's freaky stable and powerful, especially for my purposes. Only weird thing to keep in mind- I've had to multiply my learning rates by over 100x to maximally take advantage.https://medium.com/@lessw/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d …
I think in particular unstable training settings where you find yourself falling back on something like RMSProp or SGD would be a great place to start because this should be significantly better (is for me!). But it should in theory be generally useful.