Opens profile photo
Follow
Alex Farhang
@alex_farhang
phd student at caltech
alexfarhang.github.ioJoined December 2016

Alex Farhang’s Tweets

Our paper coincides with recent empirical studies seeking causal insights into neural net generalization. What sets our paper apart is the direct intervention on the quantities of interest through controlled optimization. See our paper for more interesting results! 9/10
1
3
Show this thread
Another potential explanation is weight norm. 4/10
Quote Tweet
The double descent curves are intriguing; but do you also somewhat miss the classical U-curves (like me)? One possible option: change the x-axis of your visualization. Below, the x-axis is 2-norm, and the color and arrow indicate how # of params changes.
Show this thread
Image
1
2
Show this thread
One potential explanation is margin. 3/10
Quote Tweet
1/3 Re "grokking": log x-scale is misleading, see below. But also 5y ago it was shown in arxiv.org/abs/1710.10345 that (I skip details) SGD converges to the max-margin solution ; and max-margin is linked to good generalization. So I really don't think "grokking" is surprising...
Show this thread
1
3
Show this thread