I’ve now implemented an all-GPU-compute pipeline in a Pathfinder branch, avoiding the rasterizer entirely (CPU still used for tiling). However, so far it seems that using compute shader to create vector tiles and the standard GPU rasterizer to composite them is fastest…
-
Show this thread
-
This makes sense to me: hard to beat the HW rasterizer for blending. But what’s strange is that I’ve seen numbers that show compute being 40x faster than what I’m seeing on comparable hardware, and I can’t reproduce them…
4 replies 2 retweets 10 likesShow this thread -
Replying to @pcwalton
What does the HW rasterizer have to do blending? Producing a launch order of threads you believe to result in a better access pattern?
1 reply 0 retweets 1 like -
Replying to @jhaberstro
My understanding is that on desktop it has some kind of internal HW queue of fragments coming from the shading units to reduce global memory traffic?
3 replies 0 retweets 0 likes -
Replying to @pcwalton @jhaberstro
(You probably know this stuff a lot better than I do though)
1 reply 0 retweets 0 likes -
Replying to @pcwalton
Got it, yeah not part of the rasterizer but I get what you mean. There are really two big optimizations around blending that HW will try to do - perform the blending in on chip memory and allowing concurrent execution of overlapping fragments up to the blending stage.
1 reply 0 retweets 1 like -
Replying to @jhaberstro @pcwalton
Rendering will very likely opt you into some form of lossless framebuffer compression too. As far as why your computer blending is slower, it’s hard to tell without knowing how you implemented it. Are you blending into textures directly or into threadgroup memory?
1 reply 0 retweets 2 likes -
Replying to @jhaberstro
Into threadgroup memory—just a plain old local variable. Perhaps what I’m losing is the ability to execute overlapping fragments concurrently and that explains the loss.
1 reply 0 retweets 0 likes -
Replying to @pcwalton
So roughly you have one frame buffer sized compute dispatch and each thread is responsible for blending all the layers that overlap it’s pixel? Is the code open sourced and I can tell at the two versions you have?
1 reply 0 retweets 0 likes
Yep (though it’s 4 pixels per thread, and there is also a fallback to image load/store based blending if I have to break a batch for whatever reason, but that’s not the usual path). Standard path: https://github.com/servo/pathfinder/blob/master/shaders/tile.fs.glsl … Compute-based path: https://github.com/pcwalton/pathfinder/blob/tile-fill/shaders/tile_fill.cs.glsl …
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.