Conversation

We know language models (& audio LMs) are good at predicting fMRI brain responses, but how much better are big models than small ones? Here show that big models (& big datasets!) make brain prediction MUCH better arxiv.org/abs/2305.11863
Embedded video
0:09
6.4K views
Quote Tweet
Scaling laws for language encoding models in fMRI tested whether larger open-source models such as those from the OPT and LLaMA families are better at predicting brain responses recorded using fMRI. Mirroring scaling results from other contexts, we found that brain prediction…
Image
Replying to
Scaling LM size (here within the OPT family) gives roughly log-linear improvement. Big models give ~15% boost in performance over more typical GPT/2-scale models (+22% var. exp.). (The biggest models have so many features that fMRI model fitting seems to suffer a bit, though.)
Image
1
4
Very big LMs also have many layers, which ones are best at predicting the brain? Here the x-axis ranges from embedding layer (left) to final layer (right). LLaMA and OPT behave quite differently, but 30B+ parameter models are definitely winning the day.
Image
1
8
And these models aren't just good at predicting a few parts of temporal lobe — they're good at predicting ALL of the language-responsive areas in cortex. Individual voxels in precuneus, PFC, and ITL are all predicted with correlation > 0.75. (this is insane to me)
Image
1
12
Log-linear scaling also holds for audio models (supervised and self-supervised) like WavLM & Whisper. Bigger keeps being better!
Image
1
7
And we can combine audio + text LMs to get what I think are the best-ever predictive models of language processing in the brain. (using method by ) Combined model uses audio LM to explain auditory cortex and text LM to explain everything else.
Image
1
8
So what does this mean? It means we can probably do a much better job decoding language if we use these big models. It also means we can trust better what these models say about language selectivity in different parts of the brain --> better brain maps!
Quote Tweet
In the latest paper from my lab, @jerryptang showed that we can decode language that a person is hearing (or even just thinking) from fMRI responses. nature.com/articles/s4159
Show this thread
2
10
Show more replies

Discover more

Sourced from across Twitter
I am honored to have been a part of this great work. Congratz Fan Cheng & co-authors for getting this out... In this study, she successfully reconstructed subjective experience of visual illusions using fMRI data and deep learning models.
Quote Tweet
Reconstructing visual illusory experiences from human brain activity biorxiv.org/cgi/content/sh #biorxiv_neursci
2
49
Show this thread
We know that models that generate data autoregressively, and not only transformers, suffer from compounding errors. The models could however write to the external world and retrieve, or do error correction over time. Has anyone explored this seriously? Other ideas?
Quote Tweet
I like this paper. They prove that transformers are guaranteed to suffer from compounding errors when doing long reasoning chains (as @ylecun has argued), and much apparent "success" is just due to unreliable pattern matching / shortcut learning. arxiv.org/abs/2305.18654
6
38