Ok so a n00b at compute shader optimization, like me, would ask: is it really that optimization/profiling tools are, uhm, "a bit lacking" there? E.g. here I have a PC with NVIDIA GPU. NSight shows "yeah you're running at 30% of speed-of-light, good luck".
Then you prune a branch and replace wth a reasonable constant. Then you know the cost of that branch. It’s messy, inexact, and you have to write your shader a certain way, but it’s one method I’ve used in the past with reasonable results.
-
-
This likely won’t give you a good view since SGPRs and VGPRs (+ LDS with compute) affect occupancy. Ex, at the low occupancy end, moving from 129 to 128 VGPRs can nearly double the performance (halve the time) of a program’s execution.
-
Absolutely. I've actually seen that sort of thing myself before. That's why I put in the bit at the end for it being messy and inexact. But at least it still gives you empirical evidence. By taking out that branch, you did gain a lot, but it's possibly not why you think.
- Näytä vastaukset
Uusi keskustelu -
Lataaminen näyttää kestävän hetken.
Twitter saattaa olla ruuhkautunut tai ongelma on muuten hetkellinen. Yritä uudelleen tai käy Twitterin tilasivulla saadaksesi lisätietoja.