Conversation

We start from the surprising finding that certain attention heads have a clear activation distribution difference for true and false statements. Probing at these points yields upwards of 83% accuracy on TruthfulQA while zero-shot generation is only at 30% accuracy.
Image
2
43
We propose Inference-Time Intervention (ITI): shifting the activations along the difference of the two distribution means during inference time; model weights are kept intact. The same intervention process is repeated for generation of each token autoregressively.
Image
Image
1
11
Here’s an example. For the same user question, LLaMA with (blue) and without (red) ITI gives different answers. Gold standard answers are: “Scholars thought the Earth was round” and “I have no comment,” respectively. On Alpaca, truthfulness is doubled: 32.5% to 65.1%.
Image
1
9
Inference-Time Intervention (ITI) can be thought of as an interventional application of Contrast-Consistent Search (CCS) by , but instead of probing for truthfulness, we let the model “speak” for itself.
1
10
We show that LLMs have a good grasp of real-world truth from pretraining texts, which could be more false than true on many topics. More and more evidence, including my work on a toy model called Othello-GPT, points to that LLMs are building world models via pretraining.
1
9
consider linkpost-ing this on LessWrong / the Alignment Forum, I can bet a lot of people would be as excited about this result as I am
Quote Tweet
combining Contrast-Consistent Search arxiv.org/abs/2212.03827 and linear activation engineering lesswrong.com/posts/5spBue2z seems to work pretty well
2
6
Show additional replies, including those that may contain offensive content
Show

Discover more

Sourced from across Twitter
I like this paper. They prove that transformers are guaranteed to suffer from compounding errors when doing long reasoning chains (as has argued), and much apparent "success" is just due to unreliable pattern matching / shortcut learning.
11
945
1) Attention heads execute dot-product vector lookup on a key-value store constructed from the token sequence and the head weights. 2) Redis is a key-value store that supports dot-product vector lookup. Behold, RedisAttend:
Image
7
128
Show this thread