Tweetovi
- Tweetovi, trenutna stranica.
- Tweetovi i odgovori
- Medijski sadržaj
Blokirali ste korisnika/cu @svlevine
Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @svlevine
-
A way to mitigate gradient interference in multi-task learning, which works well in supervised and reinforcement learning. Pretty simple to set up too!https://twitter.com/chelseabfinn/status/1219812204372869120 …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Models that can infer "latent robot actions" in human videos (i.e., given this video of a person using a tool, what action would the robot execute to use the tool in the same way?). Esp check out the videos on the website: https://sites.google.com/view/lpmfoai https://twitter.com/chelseabfinn/status/1212578725130207233 …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Related to RCPs, Aviral Kumar also released model inversion networks (MINs): https://arxiv.org/abs/1912.13464 Model-based optimization over high-dimensional inputs, based on learning "inverse maps" -- instead of learning the score y of x, learn the input x that goes with a given score y!pic.twitter.com/tDOMwLFeVt
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Also worth mentioning concurrent work on this: https://arxiv.org/abs/1912.02877 And our own parallel paper that explains a related idea for model-based optimization (which was on openreview before): https://arxiv.org/abs/1912.13464
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
The method we get from this is not as effective as state-of-the-art RL, but it is simple -- just supervised learning conditioned on return or advantage. There is also an interesting derivation that explains why this works, involving 2 ways to factorize p(tau, R) (see Sec 4.3)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Can suboptimal trials serve as optimal demos? Suboptimal trials are "optimal" for a policy that *aims* to be suboptimal. By conditioning policy on the reward (or advantage) we want it to get, we can use all trials as demos: https://arxiv.org/abs/1912.13465 w/ Aviral Kumar &
@xbpengPrikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Check out Glen's post "Emergent Behavior by Minimizing Chaos": https://bair.berkeley.edu/blog/2019/12/18/smirl/ … Minimizing surprise can lead to complex behaviors that maintain homeostasis, like playing vizDoom, balancing, or learning to play tetris, all w/o any reward. Paper: https://arxiv.org/abs/1912.05510 pic.twitter.com/Vf6NBXie7Z
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Sergey Levine proslijedio/la je Tweet
I'll be discussing this work and other challenges in meta-learning at the
@NeurIPSConf Bayesian Deep Learning Workshop at 1:20 pm, West Exhibition Hall C. http://bayesiandeeplearning.org/ https://twitter.com/chelseabfinn/status/1204284213739933697 …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Laura & Marvin’s blog post on this work is now out:https://bair.berkeley.edu/blog/2019/12/13/humans-cyclegan/ …
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Goal-conditioned policies can be learned with RL. But it's simpler to use imitation (supervised learning). Turns out that you can get good goal-conditioned policies by imitating your *own* data, without any human demonstrations! We call it GCSL: https://arxiv.org/abs/1912.06088 pic.twitter.com/AQlGeMcfet
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Model-Based Reinforcement Learning: Theory and Practice https://bair.berkeley.edu/blog/2019/12/12/mbpo/ … New blog post by
@michaeljanner about the taxonomy of model-based RL methods, when we should use models, and the state-of-the-art MBPO algorithm for sample-efficient RL.Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Arxiv: https://arxiv.org/abs/1912.05510 w/
@GlenBerseth, D. Geng, C. Devin,@chelseabfinn,@dineshjayaramanPrikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Typically, we think of intrinsic motivation as _maximizing_ surprise. But agents in complex worlds with unexpected events can learn meaningful behaviors by _minimizing_ surprise, leading to behaviors that seek out homeostasis: https://sites.google.com/view/surpriseminimization … Can learn vizdoom w/o rewardpic.twitter.com/M0iXzrqlDg
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
How can we get robots to imitate humans? CycleGAN can turn human videos into robot videos, and model-based RL can execute actions that track these videos. w/ Laura Smith, N. Dhawan, M. Zhang,
@pabbeel https://sites.google.com/view/icra20avid https://arxiv.org/abs/1912.04443 pic.twitter.com/CKX6OiDMTbPrikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Goal conditioned policies can also be used to plan over graphs constructed from a replay buffer! Presented tomorrow (Thu) by Ben Eysenbach, 10:45 am, poster #220 at
#NeurIPS2019 Colab here: http://bit.ly/rl_search pic.twitter.com/gd6yIprmQH
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
If we learn visual goal conditioned policies, we can plan over goals and solve long-horizon tasks. Come see Soroush Nasiriany present LEAP tomorrow at 10:45 am, poster #218 at
#NeurIPS2019https://drive.google.com/file/d/15eW0gRNC85Wct3Mt8M-9PVQfgRceT_s4/view …Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Come learn about how to effectively learn policies from entirely off-policy data with Bootstrap Error Accumulation Reduction (BEAR), presented tomorrow (Wed) at
#NeurIPS2019 by Aviral Kumar, at 5:30 pm, poster #214pic.twitter.com/fGpvmlXWOC
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Compositional plan vectors allow us to add (and subtract) "plans" in a learned embedding space, enabling one-shot imitation and composition of tasks. Find out how to train CPVs tomorrow, at our poster presentation (w/ Coline Devin and Daniel Geng). At
#NeurIPS2019 10:45 am #191pic.twitter.com/jEw2Or91HZ
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Unsupervised meta-RL with images: use variational "task clustering" to auto-generate tasks for meta-learning without any reward function, and then learn to adapt quickly to new tasks in environments with visual observations. Presented tomorrow (spotlight + poster)https://twitter.com/chelseabfinn/status/1204665485943431168 …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Come find out how HRL algorithms can utilize multiple primitives at the same time, and solve some complex long-horizon control problems, like t-rex ball dribbling!https://twitter.com/xbpeng4/status/1204647531306577920 …
0:30Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.