Stating the obvious: a lot of current deep learning tricks are overfit to the validation sets of well-known benchmarks, including CIFAR10. It's nice to see this quantified. This has been a problem with ImageNet since at least 2015.https://arxiv.org/abs/1806.00451
-
-
...issues with reproducibility of most papers; post-selection of results; lack of significance testing when comparing (often high-variance) empirical results...
Show this thread -
If you're doing a Kaggle competition and you're evaluating your models / ideas according to a fixed validation split of the training data (plus the public leaderboard), you will consistently underperform on the private leaderboard. The same is true in research at large.
Show this thread -
Here is a very simple recommendation to help you overcome this: use a higher-entropy validation process, such as k-fold validation, or even better, iterated k-fold validation with shuffling. Only check your results on the official validation set at the very end.
Show this thread -
Yes, it's more expensive, but that cost itself is a regularization factor: it will force you to try fewer ideas instead of throwing spaghetti at the wall and seeing what sticks.
Show this thread
End of conversation
New conversation -
-
-
@fchollet@FPiednoel Sorry Mr. Chollet, but what do you mean by "deep learning research... doesn't follow the scientific method?"Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Though the same could be said of a lot of published research in traditional scientific fields.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.