Here is the line of work on partially observable linear dynamical systems (LQG settings): from T^2/3 regret https://arxiv.org/pdf/2002.00082.pdf …, to T^1/2 with absolutely new parameter estimation method https://arxiv.org/pdf/2003.05999.pdf …, to our finally log(T) regret bound https://arxiv.org/pdf/2003.11227.pdf ….https://twitter.com/AnimaAnandkumar/status/1263346130399580161 …
-
Show this thread
-
Replying to @kazizzad
Btw, the log(T) is cool, but is not this one of those instances where the variance kills the log anyways? In other words, mean is small, but variance is just big.. Same comment applies to finite armed bandits. Just thinking aloud.
1 reply 0 retweets 3 likes -
Replying to @CsabaSzepesvari
Ya, in bandit the var can kill, even can be linear in T. For LDs, I havent fully understood the behavior of var yet, long been thinking about it. For the logT, we dont encounter this issue since reg is defined as "sum c_t(\pi_t)-sum c_t(\pi^*)", under same realization of noise.
2 replies 0 retweets 2 likes -
Replying to @kazizzad
Hmm, I think the real definition of the regret is the difference between costs that you could have suffered and costs that you suffered. In reality, the actual cost suffered does matter (and its variance). The variance will go linearly even for the optimal controller. 1/2
3 replies 0 retweets 1 like -
Replying to @CsabaSzepesvari @kazizzad
of course there is no such thing as the "real definition" :) what I mean is that if we deploy, we measure how well the controller does, we will have a hard time arguing that the observed variance does not matter..
1 reply 0 retweets 2 likes
It does depend on the variance of the environment for actual costs incurred. But the main take away of combining episodic updates with online learning should do well @SahinLale
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.