@FioraAeterna The "kernels" are distinct 3x3 convolutions. So there are 128x128 independent 3x3 kernels, for each in,out pair.
@FioraAeterna Well unless the CUDA version does magic there's clearly *some* way of speeding this up...
-
-
@marcan42 do you know if the CUDA version is structurally the same (i.e. same way of dividing up the problem)? -
@FioraAeterna It computes the same thing, but I don't know how it divides up the big mesh (input-major, output-major, etc.) - Show replies
New conversation -
-
-
@marcan42@FioraAeterna cuDNN doesn't do direct convolution http://devblogs.nvidia.com/parallelforall/cudnn-v2-higher-performance-deep-learning-gpus/ … - maybe API/docs mention some more implementation detail -
@DrDaxxy@FioraAeterna http://arxiv.org/pdf/1410.0759.pdf … Found the paper. Reading time!
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.