2/ We model Q-function as DeepGP, and choose Fisher kernel to analytically estimate policy gradients and its uncertainty as another DeepGP. This addresses the issue raised in Ilyas etal2020 https://openreview.net/pdf?id=ryxdEkHtPS … on failure of traditional methods estimating PGs.
-
-
Show this thread
-
We use Monte Carlo PG estimates and use uncertainty to construct a trust-region and adjust PG directions. Both improve performance and PG estimation.pic.twitter.com/L6Ewvh9gF5
Show this thread -
.
@ravitej_17 is primary author of this workShow this thread
End of conversation
New conversation -
-
-
Accurate analysis and opinion on behalf of all of us will be given by
@karthiks over to you -
out of syllabus for me (any academic paper is out of syllabus for me)
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.