Realized one of the things that has been a nagging doubt about generative AI for me is that we don’t know in what way a particular model is *untrustworthy*
99% correct does not matter if you have no idea where the 1% wrongness will be lurking
Conversation
Like generated faces have… messed up ears
Text generators do small arithmetic ok but screw up larger arithmetic problems
Where does a coherent sounding academic-paper summarizer tend to go wrong? Dangling prepositions ok. Medical paper getting doses wrong, 😬😬
etc etc
3
1
12
With humans we very quickly form trust models. Humans tend to be untrustworthy in predictable ways. With AIs we a) don’t b) might never be able to if the models are sensitive and fragile in the untrustworthiness distribution.
This is like a meta-trust problem.
1
16
ML models should come with wrongness distribution data sheets
Like material safety data sheets
2
5
19
Or like pharma ads. “May cause dizziness, nausea, weird ears, and impossible geometry. Ask your prompt engineer if dalletruda is right for you!” <generated image of oddly unsatisfying happy retirees hiking under non-Euclidean skies>
2
1
16
Mostly this has been happening either via a 2nd order plumbing or prompt stylings, like the “step by step” prompt. But I don’t yet see a theory of error mapping and correction. It’s actually dual of explainability. If you can explain your reasoning you can map your wrongness.
Quote Tweet
Replying to @vgr
They should. Tho it’s the same problem as detecting AI images. if you could make an AI that it knows it’s wrong, you also make an AI that isn’t wrong
Replying to
yes! once they can build a debuggable ai model, the improvements will be exponential. Some how build traceability into the neural net that shows how the results were determined.
2


