Opens profile photo
Follow
Click to Follow JacobSteinhardt
Jacob Steinhardt
@JacobSteinhardt
Assistant Professor of Statistics, UC Berkeley
Joined December 2011

Jacob Steinhardt’s Tweets

Pinned Tweet
My tutorial slides on Aligning ML Systems are now online, in HTML format, with clickable references! jsteinhardt.stat.berkeley.edu/talks/satml/tu [NB some minor formatting errors were introduced when converting to HTML]
Quote Tweet
Next up @satml_conf is @JacobSteinhardt who is giving a terrific tutorial on the topic of "Aligning ML Systems with Human Intent" (like all SaTML content, it is being recorded and will be released in a couple of days)
Show this thread
Image
1
45
Some nice pushback on my GPT-2030 post by , with lots of links!
Quote Tweet
I respect Jacob a lot but I find it really difficult to engage with predictions of LLM capabilities that presume some version of the scaling hypothesis will continue to hold - it just seems highly implausible given everything we already know about the limits of transformers! twitter.com/albrgr/status/…
Show this thread
2
34
Show this thread
I then consider a few ways GPT-2030 could affect society. Importantly, there are serious misuse risks (such as hacking and persuasion) that we should address. These are just two examples, and generally I favor more work on forward-looking analyses of societal impacts.
4
15
Show this thread
3. Parallel learning. Because copies have identical weights, can propagate millions of gradient updates in parallel. This means models could rapidly learn new tasks (including "bad" tasks like manipulation/misinformation).
1
19
Show this thread
In particular, I project that "GPT-2030" will have a number of properties that are surprising relative to current systems: 1. Superhuman abilities at specific tasks, such as math, programming, and hacking. 2. Fast inference speed and throughput (enough to run millions of copies)
3
28
Show this thread
Many people, including me, have been surprised by recent developments in machine learning. To be less surprised in the future, we should make and discuss specific projections about future models. In this spirit, I predict properties of models in 2030:
22
545
Show this thread
This is a very thoughtful article by that I enjoyed reading!
Quote Tweet
There's been a lot of controversy about the CAIS statement on extinction risk from AI, so let's talk about it! I wrote a post with some of my detailed thoughts on objections to the statement. coordination.substack.com/p/lets-talk-ab
4
We just put out a statement: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” Signatories include Hinton, Bengio, Altman, Hassabis, Song, etc. safe.ai/statement-on-a 🧵 (1/6)
116
1,284
Show this thread
I want to remind everyone that disabilities may also be invisible. Your colleagues, group members, students, postdocs, may be going through this. I am not an eloquent person, so WE NEED TO PAY MORE ATTENTION TO THE DISABLED AND THEIR ACCOMMODATION
Quote Tweet
Being a #disabled junior researcher in #AI comes at a massive price; when your disabilities flare, you are on your own: there is neither medical insurance nor salary for you during this difficult time This is a very important aspect that needs our attention #Academia #Insecurity
1
202
Show this thread
I elaborate on these and consider several additional ideas in the blog post itself. Thanks to for first articulating the complex systems perspective on deep learning to me. He's continuing to do great work in that and other directions at
18
Show this thread
4. Consider not building certain systems. In biology, some gain-of-function research is heavily restricted, and there are significant safeguards around rapidly-evolving systems like pathogens. We should ask if and when similar principles should apply in machine learning.
1
27
Show this thread
2. Train models to self-regulate and have limited aims. 3. Pretraining shapes most of the structure of a model. Consider what heuristics you are baking in at pretraining time, rather than relying on fine-tuning to fix problems.
1
24
Show this thread
Based on this, I examine a number of principles for improving the safety of deep learning systems that are inspired by the complex systems literature: 1. Build sharp cliffs in the reward landscape around bad behaviors, so that models never explore them in the first place.
1
26
Show this thread
Complex adaptive systems follow the law of unintended consequences: straightforward attempts to control traffic, ecosystems, firms, or pathogens fail in unexpected ways. And we can see similar issues in deep networks with reward hacking and emergence.
1
31
Show this thread
A very creative and thought-provoking read by
Quote Tweet
As AI systems become more useful, people will delegate greater authority to them across more tasks. AIs are evolving in an increasingly frenzied and uncontrolled manner. This carries risks as natural selection favors AIs over humans. Paper: arxiv.org/abs/2303.16200 (🧵 below)
Show this thread
 Forces that fuel selfishness and erode safety.
Darwinism and evolution apply to more than just biological organisms and generalized across different domains.
5
this is really nice ! I have a "group guide" (zany-leech-f80.notion.site/Guide-to-the-R) that aims at similar things but I will definitely incorporate some of what's in yours!
Quote Tweet
Since GPT-4 was released last week, I decided to switch things up from AI-related blogging and instead talk about research group culture. In my group, I've come up with a set of principles to help foster healthy and productive group meetings: bounded-regret.ghost.io/principles-for.
Show this thread
2
This is not to say research needs to be unsafe or contentious. There is valuable safe research. But if you constrain yourself to only ask the questions that won't be challenged, you simply leave on the table many of the most exciting, substantive and consequential problems.
1
1
Show this thread
Of course, different norms will work for different research groups, and I don't claim everyone should adopt the norms that our group uses, but I'm sharing what we do in the hopes that others will find it useful. Also interested in what works well for others!
5
Show this thread
In research, it's important to create an environment that allows for risk-taking and mistakes, while also pushing eventually towards excellence and innovation. I aim to set discussion norms that promote both of these.
1
10
Show this thread
I *also* still think there are unknown unknowns, and we should probably slow down and understand what current large ML systems are doing, before rushing to deploy new ones. But hopefully concrete behaviors will open the door to concrete research towards addressing them.
1
11
Show this thread
Deception is likely if we use human annotators to judge outputs when they are not in a great position to know the answer (as may be the case when training LLMs on human feedback). Optimization is a natural way to get lower loss, but opens door to reward hacking.
1
5
Show this thread
Some readers asked what specific behaviors I had in mind. While part of my point is there are unknown unknowns, examples are useful, so I provide two: Deception (misleading responses that get high reward) Optimization (exploring large surface of possibilities to pursue a goal)
1
3
Show this thread