In LSTMs, why have separate forget and write gates? Why not forget = (1 - write) ? experiment on IMDB show no acc diff, but faster.
-
-
@fchollet yea probably, but theoretically, why would you want to forget w.o. writing and vice versa? -
@rasmusbergpalm I'd say the probability distributions for write and forget are generally different (not opposite), hence 2 weight matrices.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.