Sergey Levine

@svlevine

Assistant Professor at UC Berkeley

Berkeley, CA
Vrijeme pridruživanja: travanj 2018.

Tweetovi

Blokirali ste korisnika/cu @svlevine

Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @svlevine

  1. 22. sij

    A way to mitigate gradient interference in multi-task learning, which works well in supervised and reinforcement learning. Pretty simple to set up too!

    Poništi
  2. 3. sij

    Models that can infer "latent robot actions" in human videos (i.e., given this video of a person using a tool, what action would the robot execute to use the tool in the same way?). Esp check out the videos on the website:

    Poništi
  3. 2. sij

    Related to RCPs, Aviral Kumar also released model inversion networks (MINs): Model-based optimization over high-dimensional inputs, based on learning "inverse maps" -- instead of learning the score y of x, learn the input x that goes with a given score y!

    Poništi
  4. 1. sij

    Also worth mentioning concurrent work on this: And our own parallel paper that explains a related idea for model-based optimization (which was on openreview before):

    Prikaži ovu nit
    Poništi
  5. 1. sij

    The method we get from this is not as effective as state-of-the-art RL, but it is simple -- just supervised learning conditioned on return or advantage. There is also an interesting derivation that explains why this works, involving 2 ways to factorize p(tau, R) (see Sec 4.3)

    Prikaži ovu nit
    Poništi
  6. 1. sij

    Can suboptimal trials serve as optimal demos? Suboptimal trials are "optimal" for a policy that *aims* to be suboptimal. By conditioning policy on the reward (or advantage) we want it to get, we can use all trials as demos: w/ Aviral Kumar &

    Prikaži ovu nit
    Poništi
  7. 23. pro 2019.

    Check out Glen's post "Emergent Behavior by Minimizing Chaos": Minimizing surprise can lead to complex behaviors that maintain homeostasis, like playing vizDoom, balancing, or learning to play tetris, all w/o any reward. Paper:

    Poništi
  8. proslijedio/la je Tweet
    13. pro 2019.

    I'll be discussing this work and other challenges in meta-learning at the Bayesian Deep Learning Workshop at 1:20 pm, West Exhibition Hall C.

    Poništi
  9. 13. pro 2019.

    Laura & Marvin’s blog post on this work is now out:

    Prikaži ovu nit
    Poništi
  10. 12. pro 2019.

    Goal-conditioned policies can be learned with RL. But it's simpler to use imitation (supervised learning). Turns out that you can get good goal-conditioned policies by imitating your *own* data, without any human demonstrations! We call it GCSL:

    Poništi
  11. 12. pro 2019.

    Model-Based Reinforcement Learning: Theory and Practice New blog post by about the taxonomy of model-based RL methods, when we should use models, and the state-of-the-art MBPO algorithm for sample-efficient RL.

    Poništi
  12. 11. pro 2019.
    Prikaži ovu nit
    Poništi
  13. 11. pro 2019.

    Typically, we think of intrinsic motivation as _maximizing_ surprise. But agents in complex worlds with unexpected events can learn meaningful behaviors by _minimizing_ surprise, leading to behaviors that seek out homeostasis: Can learn vizdoom w/o reward

    Prikaži ovu nit
    Poništi
  14. 11. pro 2019.

    How can we get robots to imitate humans? CycleGAN can turn human videos into robot videos, and model-based RL can execute actions that track these videos. w/ Laura Smith, N. Dhawan, M. Zhang,

    Prikaži ovu nit
    Poništi
  15. 11. pro 2019.

    Goal conditioned policies can also be used to plan over graphs constructed from a replay buffer! Presented tomorrow (Thu) by Ben Eysenbach, 10:45 am, poster #220 at Colab here:

    Poništi
  16. 11. pro 2019.

    If we learn visual goal conditioned policies, we can plan over goals and solve long-horizon tasks. Come see Soroush Nasiriany present LEAP tomorrow at 10:45 am, poster #218 at

    Poništi
  17. 10. pro 2019.

    Come learn about how to effectively learn policies from entirely off-policy data with Bootstrap Error Accumulation Reduction (BEAR), presented tomorrow (Wed) at by Aviral Kumar, at 5:30 pm, poster #214

    Poništi
  18. 10. pro 2019.

    Compositional plan vectors allow us to add (and subtract) "plans" in a learned embedding space, enabling one-shot imitation and composition of tasks. Find out how to train CPVs tomorrow, at our poster presentation (w/ Coline Devin and Daniel Geng). At 10:45 am #191

    Poništi
  19. 10. pro 2019.

    Unsupervised meta-RL with images: use variational "task clustering" to auto-generate tasks for meta-learning without any reward function, and then learn to adapt quickly to new tasks in environments with visual observations. Presented tomorrow (spotlight + poster)

    Poništi
  20. 10. pro 2019.

    Come find out how HRL algorithms can utilize multiple primitives at the same time, and solve some complex long-horizon control problems, like t-rex ball dribbling!

    Poništi

Čini se da učitavanje traje već neko vrijeme.

Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.

    Možda bi vam se svidjelo i ovo:

    ·