Conversation

Pay attention to the fine print on this 🪡 Offered to bet on his claim, and he defined “largely eliminated” as model still goofs 20% of the time (!) That’s not going to bring the profound implications he promises, any more than 99% correct driverless cars have.
Quote Tweet
LLM hallucinations will be largely eliminated by 2025. that’s a huge deal. the implications are far more profound than the threat of the models getting things a bit wrong today.
We can’t have providers of news, biography, medical info, legal info, etc make stuff up 20% of the time. And if that’s the best Big Tech can do, we need to hold them legally responsible for their errors. All of them.
8
47
80% on any topic is still better than most humans. But anyway, it’s a fair point. That’s too low. How would you define “largely” then? Let’s got with 95%?
5
11
not a solve for the hallucination problem, if still 1 in 20 just making stuff up. comparison points - databases NEVER hallucinate - calculators NEVER hallucinate - search engines may point to bad info but NEVER hallucinate – driverless car generally thought to require 99.99999
1
12
Show replies
Like the way you try to make these folks accountable for their statements . Traditional media is failing miserably there.
1
4
Show replies
20%... I've kinda viewed "longtail" to be a lower percentage than that! It also sounds larger than "a bit wrong" which he claimed was the level for current tech.
1
I think part of the issue is that this % hallucination will vary per subject-area, so we should focus on niche solutions rather than holistic ones. For example, do you think solving hallucinations in medical information will look different than solving it for news information?
although - have you met humans? :) But yes it would be good to nail down an evaluation you both agree on. The amount of $ owed between you could be a continuous function maybe...?
Fine print: what does “x% of the time” mean to you? If the same API call gets it right 8 times out of 10, reliably, you can for some apps just run a bunch of extra API calls and play Minority Report to risk-tune.
1
Just curious, what additions would you suggest to help with the hallucination problem? Maybe some module on top of the base LLM that that checks to see if the generated info is plausible? Or maybe a completely, different new way of making a LLM from scratch?