In the stuff that I have worked on, I found conv-bn-act to work better. I think that's what the original paper on bn had.
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Why do you find conv-bn-act more intuitive?
-
Because BN centers on 0 and your activation is usually a ReLU (though the intuition would hold for any asymmetrical activation)
- Show replies
New conversation -
-
-
It very much depends on the act function but for most cases, you want to use conv-bn-act. Without bn before act, saturated neurons will kill gradients. We do case studies of this across multiple activation functions in these slides: http://cs231n.stanford.edu/slides/2020/lecture_7.pdf …
- End of conversation
New conversation -
-
-
I've always done conv act bn, I think that makes more sense somehow
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
my personal workflow is: try to remember the right answer, use google, re-read this one stackoverflow comment quoting you quoting szegedy, remember it is unconvincing, test both variations
https://stackoverflow.com/questions/39691902/ordering-of-batch-normalization-and-dropout#comment78277697_40295999 …pic.twitter.com/6CFOKzClZf
-
Yeah, the whole field makes no sense. None of us even knows why BN works in the first place (the original narrative "it reduces covariate shift", while intuitive, is almost certainly wrong)
- Show replies
New conversation -
-
-
Anybody have recent data on swish?
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.