Prof. Anima Anandkumar Retweeted Manish Prajapat
𝐂𝐨𝐦𝐩𝐞𝐭𝐢𝐭𝐢𝐯𝐞 𝐏𝐨𝐥𝐢𝐜𝐲 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧 in competitive MDPs #ReinforcementLearning. It provides stable optimization, convergence to sophisticated strategies @Manish8Prajapat @kazizzad @yisongyue https://sites.google.com/view/rl-copo #AI #DeepLearning @Caltechhttps://twitter.com/Manish8Prajapat/status/1290080065397469184 …
Prof. Anima Anandkumar added,
Manish Prajapat @Manish8Prajapat
How to train agents in two-player competitive MDPs? Use "Competitive Policy Optimization(CoPO)" a fundamental policy gradient approach:
P: https://arxiv.org/abs/2006.10611
Code:https://github.com/manish-pra
Post:https://sites.google.com/view/you-rl-copo … (1/3)
W @kazizzad @alexliniger @yisongyue @AnimaAnandkumar pic.twitter.com/5SUZ4r1RO7
Show this thread
9:29 AM - 3 Aug 2020
0 replies
5 retweets
42 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.