Snapshot ensembles - a new take on an old idea... I'd like to see a comparison with Polyak averaging.http://openreview.net/forum?id=BJYwwY9ll …
-
-
Replying to @fchollet
building an ensemble out of multiple checkpoints isn't new: see https://arxiv.org/abs/1412.2007 , https://arxiv.org/abs/1606.02891 and others
2 replies 3 retweets 15 likes -
Replying to @kchonyc
I was doing it a few years back when I started out in DL, but I stopped after I learned about Polyak averaging. Works much better.
1 reply 0 retweets 0 likes -
Replying to @fchollet
how do you do it? taking the average of all steps, only the last k steps, ...?
1 reply 0 retweets 0 likes -
Replying to @kchonyc
at inference time, use a slow exponential average of the weights taken after every previous batch. Easy to accumulate in TF/Theano
3 replies 2 retweets 7 likes -
do you have a reference ?
2 replies 0 retweets 0 likes
This is what I usually cite: http://epubs.siam.org/doi/abs/10.1137/0330046?journalCode=sjcodc …
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.