@marcan42 stuff like this total_pixels = bh * bw * ((h+bh-pad-1)//(bh-pad)) * ((w+bw-pad-1)//(bw-pad)) [...]
-
-
Replying to @FioraAeterna
@marcan42 and the variable trip count loops makes me as a compiler person REALLY wish those could be constants...2 replies 0 retweets 0 likes -
Replying to @FioraAeterna
@FioraAeterna Thankfully none of that is performance-sensitive code, the OpenCL kernel is all that matters :-)1 reply 0 retweets 0 likes -
Replying to @FioraAeterna
@marcan42 oh gods that is a horrifying ratio of memory reads to arithmetic1 reply 0 retweets 2 likes -
Replying to @FioraAeterna
@marcan42 I look at https://github.com/marcan/cl-waifu2x/commit/89176cc52b99ea7bdb0b9b71589379f3d751fe95#diff-d2a065162d2048b6908eee284e028eafR24 … and recoil in horror1 reply 0 retweets 2 likes -
Replying to @FioraAeterna
@marcan42 are those at least in local memory, or are these all global reads... *sob*1 reply 0 retweets 0 likes -
Replying to @FioraAeterna
@FioraAeterna Global... I've been trying to use local memory to reuse data but everything I try makes it *slower*. *sob*2 replies 0 retweets 0 likes -
Replying to @FioraAeterna
@marcan42 but I work on a mobile GPU that isn't built around gobs and gobs of cache so maybe it's not as bad on a huge GPU1 reply 0 retweets 0 likes
@FioraAeterna Yeah I'm on a mobile GPU too (GeForce GTX 660M / GK107M).
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.