Working on an OpenCL version of waifu2x: https://github.com/marcan/cl-waifu2x … (not even remotely complete yet, please don't judge its performance!)
@DrDaxxy Since the kernel is pretty unoptimized, it's also probably vastly more sensitive to architectural differences.
-
-
@DrDaxxy E.g. I'm pretty sure the bottleneck here is __global memory accesses, since I'm (dumbly) reading the same data 128 times. -
@DrDaxxy Next step is to figure out how to load the data once into __local memory and work there. Need to figure out workgroup partitioning.
End of conversation
New conversation -
-
-
@marcan42 GPU code tends to be ;) I still found the speed difference striking, even when taking slow mobile vs fast desktop into account -
@DrDaxxy It's probably very sensitive to memory bandwidth and cache right now. It's using less than 2% of the raw FLOPS of my GPU.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.