Stating the obvious: a lot of current deep learning tricks are overfit to the validation sets of well-known benchmarks, including CIFAR10. It's nice to see this quantified. This has been a problem with ImageNet since at least 2015.https://arxiv.org/abs/1806.00451
-
-
Deep learning research, along so many axes, does not follow the scientific method. Validation set overfitting is a notable point. Also: use of weak benchmarks; impossibility to clearly link an empirical result to the big idea proposed by a paper (because of too many moving parts)
Show this thread -
...issues with reproducibility of most papers; post-selection of results; lack of significance testing when comparing (often high-variance) empirical results...
Show this thread -
If you're doing a Kaggle competition and you're evaluating your models / ideas according to a fixed validation split of the training data (plus the public leaderboard), you will consistently underperform on the private leaderboard. The same is true in research at large.
Show this thread -
Here is a very simple recommendation to help you overcome this: use a higher-entropy validation process, such as k-fold validation, or even better, iterated k-fold validation with shuffling. Only check your results on the official validation set at the very end.
Show this thread -
Yes, it's more expensive, but that cost itself is a regularization factor: it will force you to try fewer ideas instead of throwing spaghetti at the wall and seeing what sticks.
Show this thread
End of conversation
New conversation -
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I wish managers here understood this :D
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
The "Guilty pleasure" of ML
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.