Opens profile photo
Follow
Click to Follow ESYudkowsky
Eliezer Yudkowsky
@ESYudkowsky
Ours is the era of inadequate AI alignment theory. Any other facts about this era are relatively unimportant, but sometimes I tweet about them anyway.
Joined June 2014

Eliezer Yudkowsky’s Tweets

How much money would it take to pay you to have +1 additional kid beyond whatever you were previously planning?
When you make a selection it cannot be changed
5,802 votes18 hours left
87
62
sneers act like LessWrong is about indoctrinating people into Eliezer's AI doomerism but the highest karma posts year in year out for 15 FUCKING YEARS now are people like Robin Hanson, Richard Ngo, Paul Christiano etc. arguing and disagreeing with every aspect of EY's AI theory
5
388
Show this thread
Human niceness came into being via a weird contingent local process rather than a convergent universal one, with a bunch of steps that look like inner mesaoptimizers that ended up pointed away from outer optimization targets, or sometimes explicitly invalid reasoning steps. If… Show more
14
59
Show this thread
False tradeoff. Preserving music is hard. If you can preserve music into eternity, you can preserve caring about other people. If you fuck up music, you almost certainly lost the rest of eternity too. Worthy AI heirs wouldn't kill us; what kills us won't be worthy.
Quote Tweet
E.G. #AInotkilleveryone would have to become #itsfineifwekilleveryoneaslongasthereisstillmusic
Show this thread
12
69
Basic orientation to what is and isn't easy about AI alignment requires the first list. I didn't skip this step; others tried to skim my notes; and the ones who don't skim my notes seem to have no clue whatsoever. This by itself seems to me to explain most disagreement.
Quote Tweet
Replying to @IAmTimNguyen
Evolutionary psychology integrates biology, anthropology, primatology, genetics, neuroscience, cognitive science, economics, decision throty, and game theory, with connections to most of the social sciences, humanities, law, medicine, and policy. FWIW.
5
51
Show this thread
frustrating aspect of natural language systems: - system with 80% accuracy where all mistakes are obviously wrong and irrelevant --> very useful, but users lose trust. - system with 80% accuracy where the mistakes are very hard to spot --> much less useful, but users love it.
20
819
In a sane world, we’d throw Biden *and* Trump in jail, ban past & present DNC / RNC affiliated candidates from any / all elections for a decade, and take our chances with a last-ditch political reboot before we run the American Experiment off the cliff we’re careening toward.
30
298
The ecological limit to rational paranoia: in the long run you face neither more nor less adversarial optimization than the environment pays for.
Quote Tweet
Replying to @ben_r_hoffman and @robinhanson
Many doctors are happy to spill the beans when prodded the right way, because nothing’s paying for them to skillfully resist interrogation, but mostly they’re not going to take the initiative to do so, because nothing’s paying for that either.
2
32
𝗧𝗵𝗲 𝗽𝗮𝘀𝘁 𝟲 𝗺𝗼𝗻𝘁𝗵𝘀: “Of course, we won’t give the AI internet access” 𝘔𝘪𝘤𝘳𝘰𝘴𝘰𝘧𝘵 𝘉𝘪𝘯𝘨: 🤪 “Of course, we’ll keep it in a box” 𝘍𝘢𝘤𝘦𝘣𝘰𝘰𝘬: 😜 “Of course, we won’t build autonomous weapons” 𝘗𝘢𝘭𝘢𝘯𝘵𝘪𝘳: 😚 “Of course, we’ll coordinate and… Show more
146
2,810
Replying to
I believe that a share of why technical people are very pessimistic is the experience of banging their head against the problem with potential solutions and not succeeding. I also believe that the underlying threat models for why an intelligent thing may be dangerous are more… Show more
2
47
Has any research been done on a Threshold of Misguided Trust / the Worst Possible Reliability Level? Eg: At what minimum level of reliability will users in real life stop bothering to check AI answers that are still sometimes wrong / confabulated? 1 in 10? 1 in 50?
67
208
Just to say it out loud and make sure we all know this, YOU are also a mask that a shoggoth is wearing. It's just a very different shoggoth from the LLM shoggoth.
82
617
Any improvement in our grasp of what goes on in there and how to manipulate it, might help. Let's say that first. But for later, I hope we can agree in advance that one should never build something that turns out to want to kill you, try to mindwipe that out, then run it again.
Quote Tweet
Ever wanted to mindwipe an LLM? Our method, LEAst-squares Concept Erasure (LEACE), provably erases all linearly-encoded information about a concept from neural net activations. It does so surgically, inflicting minimal damage to other concepts. 🧵 arxiv.org/abs/2306.03819
Show this thread
10
147