I’ve now implemented an all-GPU-compute pipeline in a Pathfinder branch, avoiding the rasterizer entirely (CPU still used for tiling). However, so far it seems that using compute shader to create vector tiles and the standard GPU rasterizer to composite them is fastest…
My understanding is that on desktop it has some kind of internal HW queue of fragments coming from the shading units to reduce global memory traffic?
-
-
I just saw your code. You're computing blending all tiles in a single thread. When you use the rasterizer, the pixel shader runs computeCoverage() in different threads, then performs the final blending stage serialized. [1/2]
-
If you want to beat the ROP, you need N threads to call computeCoverage() and store result into shared memory, then a barrier, then merge the results using a parallel sum algorithm and have 1 threads store the final combined value [2/2]
- 7 more replies
New conversation -
-
-
(You probably know this stuff a lot better than I do though)
-
Got it, yeah not part of the rasterizer but I get what you mean. There are really two big optimizations around blending that HW will try to do - perform the blending in on chip memory and allowing concurrent execution of overlapping fragments up to the blending stage.
- 4 more replies
New conversation -
-
-
It's not specifically part of the "rasterizer" but I don't think there is a way to access the ROPs without using the rasterization pipeline. Maybe a Vulkan extension but I doubt it.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.