Realized one of the things that has been a nagging doubt about generative AI for me is that we don’t know in what way a particular model is *untrustworthy*
99% correct does not matter if you have no idea where the 1% wrongness will be lurking
Conversation
Replying to
With humans we very quickly form trust models. Humans tend to be untrustworthy in predictable ways. With AIs we a) don’t b) might never be able to if the models are sensitive and fragile in the untrustworthiness distribution.
This is like a meta-trust problem.
1
16
ML models should come with wrongness distribution data sheets
Like material safety data sheets
2
5
19
Or like pharma ads. “May cause dizziness, nausea, weird ears, and impossible geometry. Ask your prompt engineer if dalletruda is right for you!” <generated image of oddly unsatisfying happy retirees hiking under non-Euclidean skies>
2
1
16
Mostly this has been happening either via a 2nd order plumbing or prompt stylings, like the “step by step” prompt. But I don’t yet see a theory of error mapping and correction. It’s actually dual of explainability. If you can explain your reasoning you can map your wrongness.
Quote Tweet
Replying to @vgr
They should. Tho it’s the same problem as detecting AI images. if you could make an AI that it knows it’s wrong, you also make an AI that isn’t wrong
1
1
5
Replying to
someone excited told me that gpt3 feynman was teaching them quantum mechanics and let me tell you it was Terrifying gobbledegook
1
5
i wish i had the screenshot of it, because it was more technical -- passed better -- than this, but this gives you the flavor of the problem.



