New paper on truthful AI!
We introduce a definition of “lying” for AI
We explore how to train truthful ML models
We propose institutions to support *standards* for truthful AI
We weigh costs/benefits (economy + AI Safety)
(w/ coauthors at Oxford & OpenAI)
arxiv.org/abs/2110.06674
Conversation
Today, lying is a human problem. Over time, AI will generate an increasing share of text/speech. AI enables personalised, scalable deception. It needs no intention to deceive -- “lies” emerge from optimising text for rewards (e.g. sales, clicks, or Likes).
1
3
We introduce “negligent falsehoods” as a generalisation of lies. Truthful AI avoids negligent falsehoods.
Creating AI that’s both truthful and trusted could improve human epistemics and trust among humans (as well as reducing risk from lying AI).
1
1
5
Creating truthful AI is a technical challenge (e.g. finding right ML training objective) and a governance challenge.
We suggest *certifying* truthful AI before deployment and *adjudicating* violations after.
The paper brings together ideas from Machine Learning, AI policy, and AI safety/alignment.
There is a 7-page Executive Summary covering the key ideas.
Authors: , Owen Cotton-Barratt, Lukas Finnveden, Adam Bales, , , , W. Saunders
3
