Im not sure I’ve ever seen InterlockedAdd (hlsl) work correctly on Nvidia. Works on AMD tho. Can’t help but wonder if I’m doing something wrong, and AMD is more permissive. Seen a few reports of other people noticing the same thing.
See, that's my line of thought. Always question your own code first, right? Just something to the gist of: RWBuffer<uint> someBuffer; [numthreads(64,1,1)] void main( uint3 dtId : ... ) { //... InterlockedAdd(someBuffer[dtId.x], someVal); }
-
-
Is the underlying data in your buffer R32_UINT formatted? I always steer clear of atomics on RWBuffer because someone could change the buffer to be R16_UINT or R8_UINT behind the scenes and then atomics won't work any more.
-
It's definitely R32_Uint, although I'll take the advice about avoiding interlockedadd on them. I'm not sure I could find a way not to in this case. Unfortunately I can't go into too much detail without risking breaking NDA.
- Näytä vastaukset
Uusi keskustelu -
-
-
64 threads is one wavefront on AMD but two warps on NVidia. What do you do with someBuffer after? You're probably missing memory barriers.
-
I'm willing to accept a memory barrier is missing. Every time I feel like I understand barriers, they always seem to have another hand to reveal. SomeBuffer is used in another compute shader immediately following this one.
- Näytä vastaukset
Uusi keskustelu -
Lataaminen näyttää kestävän hetken.
Twitter saattaa olla ruuhkautunut tai ongelma on muuten hetkellinen. Yritä uudelleen tai käy Twitterin tilasivulla saadaksesi lisätietoja.