Opens profile photo
Follow
John Nay
@johnjnay
Stanford fellow, AI research. // More at johnjnay.com.
New Yorklaw.stanford.edu/directory/john…Joined December 2015

John Nay’s Tweets

Code, data, models for Mind2Web: Towards Generalist LLM Agents on the Web: github.com/OSU-NLP-Group/
Quote Tweet
Generalist LLM Agents Completing New Tasks On The Web -2,000 open-ended tasks from 137 real-world sites -Raw HTML of sites are often too large for context -First filtering w/ a smaller LM significantly improves effectiveness & efficiency of larger LLMs arxiv.org/abs/2306.06070
Image
57
Generalist LLM Agents Completing New Tasks On The Web -2,000 open-ended tasks from 137 real-world sites -Raw HTML of sites are often too large for context -First filtering w/ a smaller LM significantly improves effectiveness & efficiency of larger LLMs arxiv.org/abs/2306.06070
Image
1
206
Code for A plug-and-play Transformer module for task-agnostic LLM reasoning: github.com/HazyResearch/T
Quote Tweet
LLMs Are Capable of Learning How to Reason in a Task-Agnostic Way -Transformer-based reasoning module trained on synthetic data -Composed w/ LLM -Improves perf across diff model types, sizes, tasks, modalities -GPT-Neo (125M) can outperform BLOOM (176B) arxiv.org/abs/2306.07536
Image
1
74
Code for LLMs with Long-Term Memory: github.com/Victorwz/LongM
Quote Tweet
Augmenting LLMs w/ Long-Term Memory -Decoupled architecture w/ backbone LLM frozen as memory encoder & residual side-network as memory retriever -Caches & updates long-term past contexts -Outperforms strong baselines on long-context modeling benchmark arxiv.org/abs/2306.07174
Image
2
244
Code for Eliciting Truthful Answers From LLMs: github.com/likenneth/hone
Quote Tweet
Getting LLMs To Tell The Truth -Shifts LLM activations during inference, following directions across attention heads -Improves LLaMA 33% -> 65% on TruthfulQA -LLMs may have internal representation of something being true, even as they produce falsehoods arxiv.org/abs/2306.03341
Image
152
LLMs Peer-to-Peer Eval of Each Other -LLM Examiner probes, follows-up (scores align closely w/ human annotations) -For peer-exam, each LLM is Examiner -Combine all evals by voting -Leverages diverse LLM expertise for higher coverage & fairer assessments arxiv.org/abs/2306.04181
2
284
Code for Thought Cloning: AI Agent Learning to Think while Acting by Imitating Human Thinking github.com/ShengranHu/Tho
Quote Tweet
AI Agents Can Learn to Think While Acting -Thinking & action data synthetically generated -Thought Cloning trains on thoughts +behaviors -Faster than Behavioral Cloning & outperformance grows further out of distribution -Can steer agent through thoughts arxiv.org/abs/2306.00323
Image
4
255
LLMs Take the Turing Test -Largest scale Turing-style test ever conducted -1.5 million humans blind chatted w/ either another human or an LLM for a couple mins -When speaking w/ LLM agent, humans guessed it was an AI correctly only 60% of the time arxiv.org/abs/2305.20010
10
251
Behavioral Game Theory for LLMs -LLMs excel where valuing their own self-interest pays off -LLMs behave sub-optimally in games that require coordination -GPT-4 acts particularly unforgivingly, always defecting after another agent has defected only once arxiv.org/abs/2305.16867
Image
7
430
Purely Synthetic LLM Feedback Improves LLMs -Reward model trained on contrasting responses from vanilla LLM of varied size -Almost no human input -No dependency on pre-aligned LLMs -Outperforms Alpaca, Dolly, etc, which are trained on InstructGPT/humans arxiv.org/abs/2305.13735
Image
3
283
LLM vs LLM: Detecting Errors via Cross Examining Agents -An incorrect claim is likely to result in inconsistency w/ other claims -Multi-turn interactions between LLM that generated claim and Examiner LLM -Outperforms baselines across factual benchmarks arxiv.org/abs/2305.13281
1
172
the more intelligent that AI gets, the more relevant that econ (Mechanism Design) gets game theory might end up being existentially important w/ agentic deployments
2
38
Code for Tree of Thoughts: Deliberate Problem Solving w/ LLMs github.com/kyegomez/tree-
Quote Tweet
Tree of Thoughts: LLMs Deliberately Solving Problems -LLMs consider multiple reasoning paths -Self-evaluate choices to decide next course of action -Look ahead or backtrack when needed -On planning tasks: much better than GPT-4 w/ only chain-of-thought arxiv.org/abs/2305.10601
Image
1
151