new research from OpenAI used gpt4 to label all 307,200 neurons in gpt2, labeling each with plain english descriptions of the role each neuron plays in the model.
this opens up a new direction in explainability and alignment in AI, helping make models more explainable and potentially easier to align.
openai.com/research/langu
Conversation
has slightly updated his priors based on this:
Quote Tweet
Commentary at greater length:
- I'm encouraged that somebody ran right out and tried this.
- It's not clear (to me, yet) that it worked all that well, or better than expected; I have not yet signficantly updated my model of how technically hard interpretability is.
- It is… twitter.com/blader/status/… Show more
1
33
Show replies
This is cool. But what would it take to label the neurone in GPT4? do we need GPT6?
5
35
This is awesome 🙌🏻 to use AI to understand how ML algorithms work.
Maybe it could help to understand why instruct models trained with only English text, generate answers in other languages, as asked a couple of months ago.
Bringing light to the darkness 😉
2
1
30
Show replies









