If publishing your paper requires you to select specific ideas, architectures, and hyperparameters according to a fixed validation set, then it's not longer a validation set, it's a training set. And there are no guarantees that the selected ideas will generalize to real data.
-
-
Show this thread
-
Deep learning research, along so many axes, does not follow the scientific method. Validation set overfitting is a notable point. Also: use of weak benchmarks; impossibility to clearly link an empirical result to the big idea proposed by a paper (because of too many moving parts)
Show this thread -
...issues with reproducibility of most papers; post-selection of results; lack of significance testing when comparing (often high-variance) empirical results...
Show this thread -
If you're doing a Kaggle competition and you're evaluating your models / ideas according to a fixed validation split of the training data (plus the public leaderboard), you will consistently underperform on the private leaderboard. The same is true in research at large.
Show this thread -
Here is a very simple recommendation to help you overcome this: use a higher-entropy validation process, such as k-fold validation, or even better, iterated k-fold validation with shuffling. Only check your results on the official validation set at the very end.
Show this thread -
Yes, it's more expensive, but that cost itself is a regularization factor: it will force you to try fewer ideas instead of throwing spaghetti at the wall and seeing what sticks.
Show this thread
End of conversation
New conversation -
-
-
-
But NNs are exactly that, approximate f(x) that give kinda good results on preselected x->y pairs.
- Show replies
New conversation -
-
-
On the other hand, the fact that the relative ordering between the models has stayed essentially intact is a Good Thing. Especially for to those of us who need to select models for practical work from the bewildering smorgasbord out there.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
@threadreaderapp Kindly unroll -
Hallo there is your unroll: Thread by
@fchollet: "Stating the obvious: a lot of current deep learning tricks are overfit to the validation sets of well-known benchmarks, […]" https://threadreaderapp.com/thread/1003718133721387008.html … Share this if you think it's interesting.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.