Scalarisation and wave intrinsics (to allow intra-warp direct communication) is what all cool kids seem to do nowadays, here's a list of resources on the topic.
April 24, 2019
Rendering related threads
A place to store rendering related threads I find (or create) on Twitter for posterity.
Photo via @h3r2ticRendering related threads
Alright! below im gonna post my collection of links related to
occlusion culling. if I missed good links, you folks are welcome to
post them below.
3
79
220
Show this thread
Frame graphs seem like a good idea to help handle the complexity of modern rendering engines/low level graphics APIs. Good collection of resources on them with some code projects: github.com/gfx-rs/gfx/wik.
1
42
150
Show this thread
Someone at work asked me today where do I find all those presentations about graphics techniques and made me realise that it might not be so common knowledge to people just starting gfx programming. Thread of links.
9
311
921
Show this thread
This is a prime example of why one should do algorithmic improvements first instead of micro-optimisations: Shadow pass started at 16ms, using a custom buffer layout to reduce mem bandwidth took it to 13ms, using SAH to reduce traversal steps took it down to 3.6ms (GTX 970).
Quote Tweet
HRT Sponza shadows, naive BVH vs BVH with Surface Area Heuristic heatmap comparison: On the left image blue is lowest, red is > 500 steps through the BVH per pixel. On the right image, adding SAH speeds up traversal x3.7 on the HD4000 (Good SHA tutorial medium.com/@bromanz/how-t).
4
24
76
Great read: "The Elusive Frame Timing" medium.com/@alen.ladavac/ and accompanying GDC presentation: gdcvault.com/play/1025407/A
3
13
58
Show this thread
Programming with compute shaders (efficiently), balancing workloads with resources and thinking in parallel, gives many opportunities to learn how GPUs really work (well, pretty close at least). A few links to get you started. (1/N)
7
84
319
Show this thread
A common theme in the questions I received so far is that beginners feel intimidated by graphics programming and do not know how to start. They need not be though as graphics programming can be approached in different ways and at many levels of complexity (1/5).
3
22
48
Show this thread
Early prototyping work for a previous game. This is combining volumetric lightshafts and low fog in one screenspace pass.
1
6
49
Show this thread
So below, I compiled a list of awesome people that you should follow if
you're interested in computer graphics! please retweet for
visibility. and do tell if I missed someone!
14
117
290
Show this thread
Daily Pathtracing, Part 1. Initial simple C++ (Win/Mac) implementation & walkthrough. aras-p.info/blog/2018/03/2 Next part: fixing a gross perf embarrassment in it.
23
103
547
Show this thread
alright, below I'm gonna list some professors/researchers in Computer Graphics whose research and papers I absolutely love.
you folks are also very much welcome to post your own suggestions below!
8
37
188
Show this thread
Awesome uniform load optimization for loops. Beats AMD scalar optimizations in performance and is as fast as constant buffer loads on Nvidia, but without any downsides. Supports typed/raw/structured buffer AND textures:
8
49
176
Show this thread
Good example of float precision issues if world space is used in rendering. I personally prefer camera centered world space (camera is 0,0,0). This is often better than view space, because. A) faster & less lossy xform, B) normals stay in world space (easy to sample cubemaps).
Quote Tweet
Good article by @dougbinks on precision issues encountered when using screen space derivatives for normal generation in frag shader: enkisoftware.com/devlogpost-201
3
9
78
Show this thread
Just learned that HLSL considers two structures to be equivalent for function overloading if they're the same size. This feels like a legacy behavior that should be fixed in future DirectX versions! See example here: social.msdn.microsoft.com/Forums/en-US/5
6
13
41
New blog post: Mesh Shader Possibilities! reedbeta.com/blog/mesh-shad Or why graphics programmers have been yelling about mesh shaders over the last couple weeks.
7
84
232
Finished my Claybook raw->typed buffer port. Here's a thread containing some performance analysis numbers on AMD GPU. TLDR: Use raw buffers if your platform supports them!
2
6
43
Show this thread
I published a new blog post with an overview of the new OpenGL and Vulkan extensions for the NVIDIA Turing architecture : blog.icare3d.org/2018/09/nvidia
5
90
162
This is an important slide. Scenes behind the 10 Gigarays/s number. Each scene has a single high poly mesh (no background). Primary rays = fully coherent.
3
20
50
Show this thread
The most curious GPU bottleneck: ROP exports apparently retire in submission order. If your PS has early out fast path and very slow generic path, exports of fast pixels will stall (wait previous slow pixel). Moved shadow cone trace PS->CS = 50% perf gain (both Nvidia and AMD).
11
48
207
Show this thread
Indexed skinning with 3x less indexed bone matrix memory loads. Supports up to 64 matrices on GCN (up to 32 on NVidia). Needs Vulkan 1.1 subgroupShuffle. Doesn't work in DX12, because SM 6.0 only exposes GCN2 equivalent wave intrinsics: shader-playground.timjones.io/337c3f42d5b357
4
9
42
Show this thread
Idea: sample-reusing reconstruction for area lighting (similar to reconstruction of reflections). Pick one light per pixel via single-slot weighted reservoir sampling, reuse spatially with approximate occlusion. 1spp, 1spp+16 tap filter:
5
11
59
Show this thread
Okay, I finally finished the the blog post about voxel space raytracing. Hope you like it! blog.tuxedolabs.com/2018/10/17/fro
11
184
745
How To become an advanced graphics programmer:
Some general advice and tips from me, an expert graphics programmer
huge thread below.
11
229
713
Show this thread
Hey and (or other folks), what's the basic idea for dithering when you have a floating point target? There's no natural dither size eg 1/255 to add ...
3
3
Another small volumetric path tracer logical step: I added light sampling (sun on top of the sky, still single scattering only).
3
8
82
The most elegant UE4 RHI hack... Apparently I need to ship this because Turing (RTX) drivers have the same UINT clear bug that Intel and AMD fixed 1+ year ago. Spec allows using UINT clear to clear unorm/float (bitwise fill), apparently nobody reads DX11 spec "Remarks" sections.
1
2
18
Show this thread
I was going to call it "why geometry shaders are slow" but Intel had to go and be different :)
Quote Tweet
Replying to @karolgasinski and @JoshuaBarczak
Correct for the newer chips, although there's still plenty of that Intel hardware out there. But the fundamental reasons why Intel doesn't suck as much (the ability to run narrower SIMD) is still present.
But still don't use the GS.
1
2
Optimization for forward shading pixel shaders with light tile/froxel load: Instead of loading the same light with each pixel, load a different light with each pixel inside a 2x2 quad and use quad swizzle to broadcast light data in light loop. This reduces number of loads by 4x.
8
11
101
Show this thread
AMD and Nvidia fragment shader waves (Vulkan 1.1). Red channel = some lanes in wave are disabled (SIMD not fully utilized). Green channel = 16x16 tile quantization. 100% = all lanes are in same 16x16 tile. Each 25% darker green color = wave covers one more 16x16 tile.
4
23
64
Show this thread
On Nvidia GPUs (tile binning raster), you might actually be able to achieve perfect wave scalarization in full screen passes (one triangle) when reading screen aligned 2d/3d grids. Thus you could use all 32 lanes of a wave to load light/decal array -> swizzle -> loop per pixel.
1
10
Show this thread
The advantages of using a reverse depth buffer are well documented (developer.nvidia.com/content/depth-, mynameismjp.wordpress.com/2010/03/22/att) but it seems to be very important with hybrid raytracing as well, where we recreate world pos from stored depth and then compare with normal triangles' distance.
1
32
101
Show this thread
Alright! Below Im gonna post some good links, for people who are interested in volume rendering!
12
135
485
Show this thread
AMD GCN5 (Polaris) waves form more irregular pattern than Nvidia RTX (Turing). Nvidia Maxwell/Pascal/Turing have hybrid tiled rasterizers. AMD Vega (GCN5) has hybrid tiled rasterizer. Wondering how it compares.
4
6
40
Show this thread
someone asked me a good question, so I will write down the reply as a twitter thread below :D
Question is something like: "Where do the BRDF formulas come from? How can we make our own?"
Answer:
4
109
311
Show this thread
Turing has uniform datapath instruction set. No mention in any of their documents, except this: docs.nvidia.com/cuda/cuda-bina. Need to run some targeted benchmarks to find out more about it, and assemble some sort of "best practices" guide.
3
9
28
Show this thread
Every meeting with NVIDIA ever:
DEV: "Give us hardware documentation!"
NVIDIA: "Thank you for your feedback!"
Also NVIDIA: "Can you please do X using technique Y instead of Z?"
DEV: "Why?"
NVIDIA: "No reason!"
5
14
107
This is your daily reminder that in finite discrete representations such as floating and fixed point, there are almost no sets of four representable points that are coplanar. So, the computed intersection of a ray and a triangle is almost never on the surface if you test it!
9
47
169
Show this thread
I wrote a blog post about noise characteristics and why it matters in rendering. blog.tuxedolabs.com/2018/12/07/the
10
74
384
Show this thread
My thoughts about cross platform optimization: Why I have four console devkits on my desk and both AMD and Nvidia GPU in my workstation?
2
12
38
Show this thread
Color grading poll:
1) if you use a volume texture to do color grading, what resolution volume do you use? If not using a volume texture, what do you do?
2) if you could get the same quality result from a smaller texture but multiple texture reads, would that be interesting?
7
3
25
A common HLSL pitfall is that isnan()/isfinite() calls often gets optimized away, because HLSL compiler uses some fastmath-ish optimization rules, i.e. assumes non-NaN, so isnan(<assumed non-nan>) = false. Here's a Shader Playground testbed to illustrate:
shader-playground.timjones.io/0bbe04704cadda
5
24
88
Show this thread
This is how I managed to port Claybook from consoles to ~4x slower handheld. Start state: frame rate = 60 fps locked, resolution = temporally upscaled 1080p on Xbox One base model (4K on pro consoles)...
18
162
539
Show this thread
Now that people have already said highly controversial stuff like ”debugger is useless for C++ development”, I think I can share my own controversial thoughts about unit testing, DRY, copy-paste coding and function length, etc... with 20 years of C++ programming experience.
46
1,214
3,135
Show this thread
Hey Eric! I'm looking for good resources / ideas on how to batch! batch! batch! Do you have anything like that lying around? My goal here is to reduce drawcalls when I have lots of unique geometries with unique materials. Ex:
2
4
22
Holy poly ! :) Classic normal mapping (left) vs microfacet-based normal mapping. For metals especially it makes such a huge impact. Some artifacts on the right are my still wip impl, black fringes on the left instead come from backfacing normals and energy leaks #rombotools
1
31
158
Most games that reviews criticize as having bad "graphics" have poor lighting and animation. You can push limited geometry and texture very far if you get the other two right. Even just three-point lighting and distance attenuation is good, and VR makes it cheap to perform mocap
10
14
79
I'm about to make a series of highly controversial graphics statements:
4
7
36
Show this thread
When making a rendering API, you have to decide how real-number coordinates map to integer-indexed pixel squares in a 2D array that is the image (or 3D, if you're working with voxels)...
1
5
23
Show this thread
For texture minification using mip-mapping, we usually let the hardware handle everything.
But I was curious how the hardware actually implements it, so I tried to reimplement it myself in a shader.
Video of my results below. My texture minification appears to work :D
3
33
136
Show this thread
Found some screenshots from a game I worked on, no longer in production. All snow is applied in a screen-space pass, after the g-prepass, modifying the albedo, normal, metalness and roughness in the g-buffer before the lighting pass.
5
8
59
Show this thread
A few interesting posts on the Eidos website (thread):
Deferred+: next-gen culling and rendering for dawn engine eidosmontreal.com/en/news/deferr
1
73
211
Show this thread
Rendering cluster bboxes with rasterizer provides better culling accuracy than HiZ pyramid. But beware, the devil is in the details:
Quote Tweet
A few interesting posts on the Eidos website (thread):
Deferred+: next-gen culling and rendering for dawn engine eidosmontreal.com/en/news/deferr
Show this thread
1
9
53
Show this thread
So what people do these days to prevent light leaking through thin walls in irradiance volumes? Especially small walls inside open spaces (so can't simply define "rooms" with different volumes)
2
1
20
So, if I write to a UAV from a pixel shader with no synchronization, collisions are just a race, right? Not something crazy like undefined behavior?
2
8
9
89
500
Show this thread
"premultiplied alpha"
why?
because interpolation!
imagine side by side texels: 255,255,255,255
(opaque white)
and
0,0,0,0
(invisible black)
the halfway point is
127,127,127,127
as in, NASTY *GREY* FRINGE, when for sure you wanted
255,255,255,127
(seethru white)
premult=👍
3
6
33
OK, time to fess up to this grody hack. It's a simple thing, and won't really shock anybody who's actually shipped a game. But those of you with a sensitive algorithmic disposition may need to sit down and not be drinking anything. /1
5
6
48
Show this thread
Whether you are working on PBR, shadows, area lights or GI it always helps having a "ground truth" raytraced image for reference. If you can't make your own, Mitsuba is an easy to use pathtracer that can give good results. mitsuba-renderer.org (thread)
2
34
210
Show this thread
Meanwhile, I'm integrating xatlas (github.com/jpcy/xatlas) to Bakery. So far so good, it definitely does a better job than the built-in Unity unwrapper. Comparing with identical padding here
4
11
71
With all the AMD GCN architecture and scalarization blog posts recently, here's my understanding of NVIDIA Volta & Turing architectures' Independent Thread Scheduling features, and whether they affect the performance of divergent code:
2
29
64
Show this thread
GPUs execute 2d compute dispatch groups in scanline linear order. Not good for L1$ if your shader processes 2d area local data (blur, SSAO, SSR). Nvidia GDC 2019 slides show up to 47% perf increase by tiling your group.xy with a few ALU instructions. Claybook does similar stuff.
11
64
290
Nvidia’s GDC talk explains my optimization process perfectly. I have been saying this for years now: Only having occupancy graph leads to bad conclusions. You need high frequency graph of unit utilizations.
1
18
90
Show this thread
A lesson in poor cache coherency while ray tracing. Same camera position, same scene.
640 x 480 - 11.76ms
1280 x 720 - 8.54ms
1920 x 1080 - 7.0ms
4
16
78
Show this thread
You can wave hello to DX11 wave intrinsics now, ! Formally added with our latest driver release (build 6618). Checkout our release notes for link to Github with HLSL header file. Thanks again for the feedback! downloadcenter.intel.com/download/28646
Quote Tweet
Feature request to Intel: Please add DX11 extension for wave intrinsics. It’s the last remaining piece to allow widespread adoption of wave intrinsics on PC. NVAPI / AMD AGS already expose wave intrinsics on DX11. DX12 / Vulkan have built in support. All relevant HW has support.
Show this thread
3
29
105
1
2
10
Show this thread
To the lovely CG folk who follow me here: what are the great “must read” classic Computer Graphics papers from the pre-gpu era that still hold up today?
8
8
31
Looking for academic papers or technical documents discussing high-quality texture baking using ray tracer(without rasterizer). e.g. UV map generation, uniformly random sampling baking points over mesh faces, etc.
2
6
24


























