1/ How accurately compute Policy Gradients and uncertainty in #ReinforcementLearning? We propose Deep Bayesian Quadrature Policy Optimization https://arxiv.org/pdf/2006.15637
@kazizzad @yisongyue #AI #DeepLearning
-
Show this thread
-
2/ We model Q-function as DeepGP, and choose Fisher kernel to analytically estimate policy gradients and its uncertainty as another DeepGP. This addresses the issue raised in Ilyas etal2020 https://openreview.net/pdf?id=ryxdEkHtPS … on failure of traditional methods estimating PGs.
1 reply 0 retweets 18 likesShow this thread -
We use Monte Carlo PG estimates and use uncertainty to construct a trust-region and adjust PG directions. Both improve performance and PG estimation.pic.twitter.com/L6Ewvh9gF5
1 reply 0 retweets 22 likesShow this thread
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.