1/Large language models like Galactica and ChatGPT can spout nonsense in a confident, authoritative tone. This overconfidence - which reflects the data they’re trained on - makes them more likely to mislead.
2/In contrast, real experts know when to sound confident, and when to let others know they’re at the boundaries of their knowledge. Experts know, and can describe, the boundaries of what they know.
3/Building large language models that can accurately decide when to be confident and when not to will reduce their risk of misinformation and build trust.
Great observation. It seems to me that if the prompt includes extra requests (favor facts over assumptions) the output is more balanced. I tried the question of overconfidence (asking for more correctness) and the output was good.
I'd be interested in how a tool like this interacts with someone like me in my workspace (clinical psychotherapy); I suspect it would present much like defensive and confabulating patients ... who knows?!
A car run by a FSD system accelerating 0 to 40 in 2 secs when the light turns green may seem overconfident to passengers and nearby drivers. Is it more likely to crash?
Overconfidence sometimes is a perception. It depends on whether this FSD is wired to analyze and adapt fast.
New from Google Research! Language models perform amazing feats, but often still "hallucinate" unsupported content. Our model, RARR, automatically researches & revises the output of any LM to fix hallucinations and provide citations for each sentence. https://arxiv.org/abs/2210.08726