Conversation

i do not deeply understand this but it seems super fascinating? not sure humanity is ready to reckon with a brain of a god that holds probable-truths about things, seems like a culture war explosion forming
Quote Tweet
How can we figure out if what a language model says is true, even when human evaluators can’t easily tell? We show (arxiv.org/abs/2212.03827) that we can identify whether text is true or false directly from a model’s *unlabeled activations*. 🧵
Show this thread
Replying to
There was an old novella by T.E.D. Klein wherein a poet pens the line "I'll create me a Creator" which of course goes terribly wrong in eldritch and occult ways and I think about that every time I see people head-scratching about the progress of AI
Replying to
We need a better understanding of "truth". The most true thing you can say is "this statement is logically coherent within a certain model or set of assumptions". Then you can establish consensus and contribute to the model. (Ideally the scientific model)
2
Replying to
This is not truth as in absolute truth, it's true as in the LLM is not telling fiction, lying, or "hallucinating". The actual "knowledge" contained in the LLM is not likely to be any more accurate than Wikipedia, perhaps even less because it will learn from uncited Reddit threads
Show more replies