@marcan42 *earperk* optimizing GPU code? do tell!
-
-
Replying to @FioraAeterna
@FioraAeterna https://github.com/marcan/cl-waifu2x … OpenCL is hard and Nvidia makes it harder by not having a profiler...1 reply 0 retweets 1 like -
Replying to @FioraAeterna
@marcan42 and the variable trip count loops makes me as a compiler person REALLY wish those could be constants...2 replies 0 retweets 0 likes -
Replying to @FioraAeterna
@FioraAeterna Thankfully none of that is performance-sensitive code, the OpenCL kernel is all that matters :-)1 reply 0 retweets 0 likes -
Replying to @FioraAeterna
@marcan42 oh gods that is a horrifying ratio of memory reads to arithmetic1 reply 0 retweets 2 likes -
Replying to @FioraAeterna
@marcan42 I look at https://github.com/marcan/cl-waifu2x/commit/89176cc52b99ea7bdb0b9b71589379f3d751fe95#diff-d2a065162d2048b6908eee284e028eafR24 … and recoil in horror1 reply 0 retweets 2 likes -
Replying to @FioraAeterna
@marcan42 are those at least in local memory, or are these all global reads... *sob*1 reply 0 retweets 0 likes -
Replying to @FioraAeterna
@FioraAeterna Global... I've been trying to use local memory to reuse data but everything I try makes it *slower*. *sob*2 replies 0 retweets 0 likes
@FioraAeterna This is the horrible thing it's trying to compute (well, the worst step). 128 ins, 128 outs, full mesh.pic.twitter.com/aOpHDYoL0z
-
-
Replying to @marcan42
@FioraAeterna The "kernels" are distinct 3x3 convolutions. So there are 128x128 independent 3x3 kernels, for each in,out pair.1 reply 0 retweets 0 likes - Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.