I’ve now implemented an all-GPU-compute pipeline in a Pathfinder branch, avoiding the rasterizer entirely (CPU still used for tiling). However, so far it seems that using compute shader to create vector tiles and the standard GPU rasterizer to composite them is fastest…
Into threadgroup memory—just a plain old local variable. Perhaps what I’m losing is the ability to execute overlapping fragments concurrently and that explains the loss.
-
-
So roughly you have one frame buffer sized compute dispatch and each thread is responsible for blending all the layers that overlap it’s pixel? Is the code open sourced and I can tell at the two versions you have?
-
Yep (though it’s 4 pixels per thread, and there is also a fallback to image load/store based blending if I have to break a batch for whatever reason, but that’s not the usual path). Standard path: https://github.com/servo/pathfinder/blob/master/shaders/tile.fs.glsl … Compute-based path: https://github.com/pcwalton/pathfinder/blob/tile-fill/shaders/tile_fill.cs.glsl …
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.