Conversation

Replying to
Particle sim P1 (integrator): Typed buffers double the number of loads/stores, because Typed UAV load limitation forces in-place update to single channel uint32/float32. However this pass is fully mem bound -> no perf impact of added mem instructions.
1
1
Shape match reduction: This pass loses some scalar optimizations (typed buffers aren't scalar optimized). VGPR count = +4. Increased number of narrow typed loads is a big perf impact, because this pass loads all particles twice. 40% slower with typed buffer.
1
2
Physics sim pass 2 (apply shape match, collisions, deform): Huge +16 VGPR cost (64 -> 80) because of lost scalar opts (typed buffer's can't be scalar optimized). 2x load/store count (typed UAV load limit). Fortunately this shader is fully memory bandwidth bound. Only 3% slower.
2
2
Particle mesh prepare (offload from VS): +4 VGPR because of lost scalar opts. Loads 4x particle neighbors to calculate normal/tangent. Lots of extra loads because of typed UAV load limitations. Part positions are 4x16b = Load2 (or 2x typed R32 load). BW bound -> only 8% slower.
1
1
Fluid sim (pressure calc pass): +8 VGPR because of lost scalar optimizations. Other fluid sim passes seem fine. Fluid sim passes are fully async compute overlapped with rendering, so exact perf analysis is not possible.
1
1
Added VGPRs and higher VMEM issue rate isn't as much a problem as I expected for passes without async compute overlap. These things are a bigger problem with async compute overlap. Fortunately I don't need to use typed buffer code path on platforms that benefit greatly from async
1
1
Optimization possibility: Use R32G32_UINT read only view (SRV) for shape match reduction pass and render setup pass, because these passes write output to a separate buffer. Typed UAV load format limitation is not an issue in these cases.
1
You are allowed to create multiple raw/typed views to the same buffer with different element bit depths. This trick is not legal for textures. Texture views must have equal bit depth to the typeless format of the resource. Also structured buffer's don't support typed/raw views.
1
3
Also worth noting that simultaneous SRV (read) and UAV (write) binding to same resource is not allowed. DX11 will automatically unbind one of them in this case. You can't use this to avoid the typed UAV load issue.
1
1