Do you think replacing: float t = max(-n.z, 0.0); For something like: float t = (n.z + abs(n.z) /2; Would get an extra (small) boost? Would it be worth it? Taken from here: https://stackoverflow.com/a/7424900
-
-
-
Assuming you are doing this on a GPU or with SSE then max is a single instruction, so it would actually be worse.
Keskustelun loppu
Uusi keskustelu -
-
-
Where does the estimated perf cost come from?
-
SCUI (PlayStation 4 Shader Analysis Tool)
- Näytä vastaukset
Uusi keskustelu -
-
-
Nice!! I updated this guy: https://www.shadertoy.com/view/Mtfyzl . I looped 50k times till I got 13 fps, then tried your version, the original and, also replacing the CMOVs with SIGN+MUL (copysign?), but always got the same 13 fps. WebGL, so who knows what's this being compiled to anyways.
-
Yeah, obviously your mileage may vary with different compilers and platforms, but I think it is a pretty neat simplification and it is unlikely to be worse than the original :). I just did a quick test in your shadertoy and I actually get a speedup if I do a loop ~17fps->~21fps.
- Näytä vastaukset
Uusi keskustelu -
-
-
How does it compare with (or apply to) what
@ZedCull is doing here: http://johnwhite3d.blogspot.co.uk/2017/10/signed-octahedron-normal-encoding.html … ? -
His approach seems to basically be hemi-octahedron encoding with explicit sign bit for z, so it is a 3 component scheme. I will have to look at perf at work, but they seem very comparable.
- Näytä vastaukset
Uusi keskustelu -
-
-
GPUs still don't have callstacks and don't support recursion, so there really is no call convention. Everything will just be inlined. Thinking about it like that makes it is obvious that the compiler will hoist independent work out of the loop.
Kiitos. Käytämme tätä aikajanasi parantamiseen. KumoaKumoa
-
-
-
Right, this is essentially the
@ZedCull approach, but with the sign bit packed into x. To get even precision you need x to be one more bit than y, so an odd number of bits in total. This approach on the other hand has even precision on even total bits because of the fold. -
Also, this is just a faster/simpler way to do the old encoding, not a new encoding.
Keskustelun loppu
Uusi keskustelu -
Lataaminen näyttää kestävän hetken.
Twitter saattaa olla ruuhkautunut tai ongelma on muuten hetkellinen. Yritä uudelleen tai käy Twitterin tilasivulla saadaksesi lisätietoja.