Rohin Shah

@rohinmshah

PhD student at the Center for Human-Compatible AI at UC Berkeley. I publish the Alignment Newsletter.

Joined October 2017

Tweets

You blocked @rohinmshah

Are you sure you want to view these Tweets? Viewing Tweets won't unblock @rohinmshah

  1. Pinned Tweet
    8 Jul 2018

    I publish the Alignment Newsletter, a weekly publication with recent content relevant to AI alignment. Check it out!

    Undo
  2. Jan 29

    Wondering what the field of long-term AI safety does, but don't want to read hundreds of posts? Check out my review of work done in 2018-19! Please do leave comments and suggestions: The summary is also Alignment Newsletter #84:

    Undo
  3. Jan 22

    [Alignment Newsletter #83]: Sample-efficient deep learning with ReMixMatch -

    Undo
  4. Jan 15

    [Alignment Newsletter #82]: How OpenAI Five distributed their training computation -

    Undo
  5. Jan 8

    [Alignment Newsletter #81]: Universality as a potential solution to conceptual difficulties in intent alignment -

    Undo
  6. Jan 2

    Alignment Newsletter #80: Why AI risk might be solved without additional intervention from longtermists -

    Undo
  7. Jan 1

    Alignment Newsletter #79: Recursive reward modeling as an alignment technique integrated with deep RL -

    Undo
  8. 25 Dec 2019

    Alignment Newsletter #78: Formalizing power and instrumental convergence, and the end-of-year AI safety charity comparison -

    Undo
  9. 18 Dec 2019

    [Alignment Newsletter #77]: Double descent: a unification of statistical theory and modern ML practice -

    Undo
  10. Retweeted
    12 Dec 2019

    Shout out to for his impressive AI alignment newsletter. If you want to keep up to speed with what is going on in the field of AI alignment, there's nothing better: His team has summarised 1,200 papers to date!

    Undo
  11. Retweeted
    10 Dec 2019

    Want to ensure AI is beneficial for society? Come talk to like-minded people at the Human-Aligned AI Social at , Thursday 7-10 pm, room West 205-207.

    Undo
  12. 4 Dec 2019

    [Alignment Newsletter #76]: How dataset size affects robustness, and benchmarking safe exploration by measuring constraint violations -

    Undo
  13. 27 Nov 2019

    [Alignment Newsletter #75]: Solving Atari and Go with learned game models, and thoughts from a MIRI employee -

    Undo
  14. 20 Nov 2019

    [Alignment Newsletter #74]: Separating beneficial AI into competence, alignment, and coping with impacts -

    Undo
  15. 13 Nov 2019

    [Alignment Newsletter #73]: Detecting catastrophic failures by learning how agents tend to break -

    Undo
  16. 6 Nov 2019

    [Alignment Newsletter #72]: Alignment, robustness, methodology, and system building as research priorities for AI safety -

    Undo
  17. 30 Oct 2019

    [Alignment Newsletter #71]: Avoiding reward tampering through current-RF optimization -

    Undo
  18. 23 Oct 2019

    [Alignment Newsletter #70]: Agents that help humans who are still learning about their own preferences -

    Undo
  19. 21 Oct 2019

    Real humans adapt to the opaque protocols that SP learns, and play differently than the naive behavior cloned model that our agent was trained against, so the effect is smaller. Nonetheless, the human-aware agent still does better, sometimes beating human performance! (4/4)

    Show this thread
    Undo
  20. 21 Oct 2019

    We need an agent that has the right “expectation” about its partner. Obvious solution: train a human model with behavior cloning, and then train an agent to play well with that model. This does way better than SP in simulation (i.e. evaluated against a “test” human model). (3/4)

    Show this thread
    Undo
  21. 21 Oct 2019

    In competitive games, the minimax theorem allows self-play to be agnostic to its opponent: if they are suboptimal, SP will crush them even harder. That doesn’t work in collaborative games, where the partner’s suboptimal move and SP’s failure to anticipate it will hurt. (2/4)

    Show this thread
    Undo

Loading seems to be taking a while.

Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.

    You may also like

    ·