LLMs Take the Turing Test
-Largest scale Turing-style test ever conducted
-1.5 million humans blind chatted w/ either another human or an LLM for a couple mins
-When speaking w/ LLM agent, humans guessed it was an AI correctly only 60% of the time
arxiv.org/abs/2305.20010
Conversation
Couldn't find if they used GPT-3 or 4, but when I played it responded really fast so I assume it was 3.5-turbo and not 4?
Wonder if the results will differ w/ 4
1
I am a little curious about understanding was there any constraint on the conversations? Else, it is very trivial to detect while communicating with Chatbots like ChatGPT will give answers like "As an AI-BOT I am not allowed...." of these types or various similar ones.
2
3
lmao. aside from the other obvious ways this was not a turing test, this stands out:
1
4
I got 1 bot 1 human, guessed both.
This doesn't really get to the heart of the Turing test, because it's too brief.
In a good Turing test, both bot & human are trying to convince you they're human.
The conversation can be in-depth, because that is where chatbots will fail.
1
Is a Metaphor Cypher (hidden random vector in the word space) that distinguishes humans from AI easier to do manually or via automation?
Figure out what the fable's really about to authorize charges to your bank account.
Show more replies







