Updated paper on implicit competitive regularization in #GANs https://arxiv.org/abs/1910.05852 Blog: https://f-t-s.github.io/projects/icr/ GANs work due to simultaneous optimization of generator & discriminator; not choice of divergence. With Florian Schaefer @Kay12400259 @Caltechpic.twitter.com/QwmJgJhzDZ
-
Show this thread
-
GANs work when both agents are simultaneously updated. If one agent is fixed and other updated, performance worsens. Hence, GANs don't converge to Nash. Instead, implicit regularization from simultaneous updates is key to good performance.pic.twitter.com/jIEwOpGGwg
1 reply 10 retweets 39 likesShow this thread -
Implicit competitive regularization in GANs encourages discriminators to learn slowly. It is more effective than explicit gradient penalties. Competitive gradient descent further strengthens this implicit reg. by using mixed Hessian to account for interaction among playerspic.twitter.com/t5yXDYYhMK
1 reply 0 retweets 13 likesShow this thread -
Prof. Anima Anandkumar Retweeted Prof. Anima Anandkumar
For details on competitive gradient descent, check out this threadhttps://twitter.com/AnimaAnandkumar/status/1205173860284293121?s=20 …
Prof. Anima Anandkumar added,
Prof. Anima Anandkumar @AnimaAnandkumarFlorian Schafer and I have a#NeurIPS2019 poster #195 today from 10:45 to 12:45. GAN training is unstable + mode collapse. How to fix optimization in GANs? We propose competitive gradient descent: each update is a Nash equilibrium of a local game. Blog: https://f-t-s.github.io/projects/cgd/ pic.twitter.com/RinSTyKiygShow this thread3 replies 0 retweets 30 likesShow this thread -
Replying to @AnimaAnandkumar
The interpretation behind second order derivatives is really interesting in your work and the results are amazing. An optimization question: isn't CGD much slower than GD per iteration? (As I got, you compute D_{xy} which is really slow? Also, what is the case in Neural Nets?
1 reply 0 retweets 0 likes -
Replying to @mttoghani
Prof. Anima Anandkumar Retweeted Prof. Anima Anandkumar
CGD is only twice cost of GD when mixed-mode differentiation is available, as in
#JAX Further, with conjugate gradients, iteration reduces to GD when there is no instability.https://twitter.com/AnimaAnandkumar/status/1219133562651176965?s=20 …Prof. Anima Anandkumar added,
Prof. Anima Anandkumar @AnimaAnandkumarWe used JAX for competitive gradient descent (CGD) with Hessian-vector products. Mixed mode differentiation in JAX makes this efficient (just twice cost of backprop). We used CGD for training GANs and for constrained problems in RL. This library will be very useful@pierrelux https://twitter.com/hardmaru/status/1219088493583880192 …Show this thread2 replies 0 retweets 7 likes
Our current @PyTorch implementation for CGD is also quite fasthttps://github.com/devzhk/Implicit-Competitive-Regularization …
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.