https://arxiv.org/abs/1712.05134
"BT-LSTM utilizes 17,388 times fewer parameters than the standard LSTM to achieve an accuracy improvement over 15.6% in the Action Recognition task on the UCF11 dataset."
Okay, this sound impossibly good to me... 
-
-
Replying to @evolvingstuff
Having played with factorized weight matrices for RNNs I'm also a bit skeptical. It could be that the prior introduced by factorization is extremely good at this particular task at hand. No free lunch!
3 replies 5 retweets 12 likes
Replying to @hardmaru @evolvingstuff
Factorized kernels serve as a regularization mechanism. There are plenty of situations where introducing regularization leads to a big improvement -- compared to terrible, overfit baselines.
0 replies
1 retweet
16 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.