1/ How accurately compute Policy Gradients and uncertainty in #ReinforcementLearning? We propose Deep Bayesian Quadrature Policy Optimization https://arxiv.org/pdf/2006.15637
@kazizzad @yisongyue #AI #DeepLearning
-
-
We use Monte Carlo PG estimates and use uncertainty to construct a trust-region and adjust PG directions. Both improve performance and PG estimation.pic.twitter.com/L6Ewvh9gF5
Show this thread -
.
@ravitej_17 is primary author of this workShow this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.