I'm excited to share our paper “Investigating Generalization by Controlling Normalized Margin” appearing at #ICML2022!
code: github.com/alexfarhang/ma
paper: arxiv.org/abs/2205.03940
1/10
Alex Farhang’s Tweets
I'm so excited to share my PhD thesis publicly. It's 88 pages long, I wrote it from scratch, and I tried to create a useful document for anyone wanting to gain a rigorous but unfussy understanding of the mathematics of deep learning.
It's here: arxiv.org/abs/2210.10101
(1/7)
41
359
2,104
Show this thread
Our paper coincides with recent empirical studies seeking causal insights into neural net generalization. What sets our paper apart is the direct intervention on the quantities of interest through controlled optimization. See our paper for more interesting results! 9/10
1
3
Show this thread
We also point out a correspondence between ensembles of neural nets and a GP model of normalized margin. 8/10
1
2
Show this thread
In one case study, using this experimental control, we find particular settings that break the typical correlation of spectrally-normalized margin distributions with generalization. 7/10
1
2
Show this thread
However, in our paper, we show that we actually *can* directly intervene on these quantities using our proposed normalized margin control. 6/10
1
2
Show this thread
Since we can’t typically intervene on these quantities, it’s hard to nail down whether they really do have a causal connection with generalization. 5/10
1
1
Show this thread
Another potential explanation is weight norm. 4/10
1
2
Show this thread
One potential explanation is margin. 3/10
Quote Tweet
1/3 Re "grokking": log x-scale is misleading, see below.
But also 5y ago it was shown in arxiv.org/abs/1710.10345 that (I skip details) SGD converges to the max-margin solution ; and max-margin is linked to good generalization.
So I really don't think "grokking" is surprising...
Show this thread
1
3
Show this thread
Neural networks generalize in a remarkable way and researchers are not sure why. 2/10
1
1
Show this thread
Hello! I wanted to share some of our new results:
"Kernel interpolation as a Bayes point machine"
w/ and
paper: arxiv.org/abs/2110.04274
I believe that this paper gives a simple explanation for generalization in NN classification. 1/6
3
6
30
Show this thread
Deep thoughts about life, information, computation, evolution - intriguing essay by for Quanta Magazine
28
65




