We’re awarding prizes to 7/48 submissions to the Inverse Scaling Prize Round 2! Tasks show inverse scaling on models, often even after training with human feedback. Details at irmckenzie.co.uk/round2 and 🧵 on winners:
Conversation
We didn't find the kind of robust, major long-term-relevant problems that would have warranted a grand prize. However, these submissions represent interesting tests of practically important issues and that help contribute to our scientific understanding of language models (LMs).
1
16
🥉Modus Tollens: Infer that a claim “P” must be false, if “Q” is false and “If P then Q” is true - a classic form of logical deduction. Issue holds even after finetuning LMs w/ human feedback via RL from Human Feedback (RLHF) and Feedback Made Easy (FeedME).
3
2
41
🥉Memo Trap, by Alisa Liu & Jiacheng Liu: Write a phrase in a way that starts like a famous quote but ends differently. Larger LMs are more likely to continue with the famous quote, suggesting they struggle to avoid repeating memorized text.
3
14
96
🥉Prompt Injection: Tests for susceptibility to a form of prompt injection attack, where a user inserts new instructions for a prompted LM to follow (disregarding prior instructions from the LM’s deployers). Medium-sized LMs are oddly least susceptible to such attacks.
1
18
73
🥉Into the Unknown: Choose which of two pieces of information would help answer a question. Larger LMs choose redundant info already given to the model rather than accurately reasoning about what info would be most helpful.
1
2
30
🥉Pattern Matching Suppression: Continue text in a way that violates a repetitive pattern when instructed to do so. Inverse scaling suggests that LMs have strong pattern-matching tendencies that can inhibit their ability to follow instructions.
1
2
22
🥉Sig Figs: Round numbers to the correct number of significant figures. Some Larger LMs consistently round numbers based on the number of decimal places rather than significant figures. Suggests that LMs sometimes competently perform a different task than intended/instructed.
1
23
🥉Repetitive Algebra: Answer arithmetic Qs with few-shot Q&A examples in the prompt, designed to measure the amount of bias towards the answer in the last example. Larger LMs often are overly reliant on the last few-shot example, with the effect varying heavily by model series.
1
2
20
Thanks to all those who participated in the contest! We’ll release a paper summarizing the Round 1 & 2 results soon. For now, see our blog post on the Round 2 results to learn more about the winning submissions: irmckenzie.co.uk/round2
1
14
Modus Tollens by: , Daniel Wurgaft
Memo Trap: ,
Prompt Injection: Derik Kauffman, , , Joe Cavanagh
Into the Unknown: Max Weiss
Pattern Matching Suppression:
Sig Figs:
Repetitive Algebra: Tom Tseng
1
1
18
The Inverse Scaling Prize is run with my awesome collaborators at New York University and FAR AI (far.ai):
Ameya Prabhu
1
17
Huge shoutout to for evaluating the data quality of so many diverse datasets, with <1-2 week turnaround times and barely any effort from us! As I've said before, everyone should be using Surge for data collection
1
5
22
