Interesting! Is this related to BAMDP? cc @luisa_zintgraf @shimon8282 @yaringal
-
-
-
Yes! I think this paper will be very helpful to read if you typically think of "Bayesian" RL as something separate from the "usual" problem... On some level, it's just choosing between an average-case or worst-case loss!pic.twitter.com/QsM9qUQGYr
- Još 4 druga odgovora
Novi razgovor -
-
-
You mean that the way probabilistic inference is described in prior works, the inference is used for planning and not learning? I'd think one can fix this. My bigger complaint for this literature is that they do not even try to address compute cost, not even in simple cases.
-
@bodonoghue85 may have some more thoughts, but I don't think we've worked through this yet! I got into this because of a genuine confusion at conferences where people would talk about "sampling from the posterior" in RL.. But their algorithm didn't seem like Thompson sampling! - Još 12 drugih odgovora
Novi razgovor -
-
-
Separately, in a non-Bayesian approach, there are fast ways to perform probabilstic RL inference here https://arxiv.org/abs/1202.3720 . Futhermore, how probabilistic inference RL relates to policy gradients is explored in https://papers.nips.cc/paper/4576-a-unifying-perspective-of-parametric-policy-search-methods-for-markov-decision-processes … and http://www.jmlr.org/papers/volume17/15-414/15-414.pdf …
-
I also like the work by Marc Toussaint https://dl.acm.org/citation.cfm?id=1553508 …
- Još 2 druga odgovora
Novi razgovor -
-
-
Is it related to probability collectives?
-
Maybe... But I don't know what that is!
Kraj razgovora
Novi razgovor -
-
-
Does this basically come down to aleatoric vs. epistemic uncertainty? (with RL as Inference ignoring the distinction?)
-
It's slightly to do with that... But I would say the issue is "RL as Inference" introduces this auxiliary notion of "probability of optimal" that has nothing to do with *either* aleatoric or epistemic uncertainty. It's more like a "dummy variable" that reframes policy gradient
- Još 1 odgovor
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.