In LSTMs, why have separate forget and write gates? Why not forget = (1 - write) ? experiment on IMDB show no acc diff, but faster.
-
-
Replying to @rasmusbergpalm
@rasmusbergpalm IMDB dataset is too small to fully utilize LSTM...1 reply 0 retweets 1 like
Replying to @rasmusbergpalm
@rasmusbergpalm I'd say the probability distributions for write and forget are generally different (not opposite), hence 2 weight matrices.
8:28 AM - 30 Apr 2015
0 replies
0 retweets
1 like
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.