I don't recall if there has been any rigorous study about whether to do conv-act-bn or conv-bn-act (I always do conv-bn-act since it's the most intuitive). Anyone has a good reference?
-
-
But the asymmetrical activation immediately throws out the nice unit gauss you're getting from BN. Shouldn't you want the output of Relu to be gaussian for all the usual reasons we like BN?
-
There are arguments both ways, so in the absence of reliable theory, I'm looking for systematic empirical evidence
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.