Conversation

Replying to
informed loop unrolling, etc.) and flexible like uniforms in that they can be set at runtime (not with a cost that uniforms don't suffer from, but still after the shader has already been compiled to SPIR-V). Unfortunately, D3D12 doesn't provide anything like that. [...]
1
12
Consider this HLSL pixel shader. On lines 1 and 2 we have a boolean and an integer, respectively, that we would like to be different across multiple versions of the shader. So far, the only option was to programmatically patch the HLSL code and compile every time. [...]
Image
1
11
This method makes it possible to act directly on the DXIL representation of a shader so there's no need to go all the way back to HLSL and pay the cost of parsing a high level language again and again. 1. HLSL PATCHING [...]
1
8
First, replace every `define` by a `volatile` integer variable and assign correlative values, starting with 0, like in the picture. Our former BLACKOUT define becomes `sc_blackout` (our SC 0). Our former BRIGHTEN_PASSES define becomes `sc_brighten_passes` (our SC 1). [...]
Image
1
8
By using `volatile` on those variables we can be sure that DXC can't optimize out anything about them, so they will stay even at `-O3` in the generated DXIL. This is very important. Optimizations will happen later. [...]
1
7
3. DXIL PATCHING The goal of this section is to prepare our DXIL so it's ready to be quickly patched for our wanted SC values anytime we want to create a new pipeline with them. Just in case, all in this section must be only only once per shader and can be cached. [...]
1
9
3.2. COLLECTION OF ALLOCATIONS OF VOLATILES We have to make a database of the constants' allocations (where our 0 and 1 live). Steps: - Locate all the volatile stores, which look like `store volatile i32 9, i32* %59, align 4`. [...]
Image
1
9
- For each of them, follow the pointer to the source value (%59 in this example) to find the corresponding 'alloca', which looks like this: `%63 = bitcast i32* %59 to i8*`. - Therefore, so to speak, we know that our SC lives at %63. (That's why we needed [...]
1
8
to assign correlative values, to be able to map the store instructions to its corresponding SC.) (The 9 in the example would correspond to SC 9 if we would have had that many. For us, they are our 0 and 1.)
1
7
3.2. PATCHING WITH CONSTANTS + CLEANUP We'll change the code so it loads constants instead of the volatiles and then get rid of everything that is no longer relevant. - Locate all the volatile loads which are users (LLVM jargon) of each SC location we collected earlier, [...]
Image
1
7
which look like `%590 = load volatile i32, i32* %61, align 4`. - Replace each (the %590 in the example) by a constant with a special value, computed as 0x45678900 + the SC index the corresponding SC whose location we are searching for (our 0 or 1). [...]
1
5
4. SC BIT OFFSET DATABASE During the serialization to DXIL, a few changes done to our copy of portions of LLVM have done a good thing for us. Namely, whenever a constant muliple of 0x45678900 (the sentinel number from earlier) is being streamed out to the bitcode flow, [...]
1
5
it has recorded a pair into a map. - The key is the value of the constant mod the magic value, which is our SC id (our 0 or 1, again). - The value is the count of bits written to the stream by now, nothing else that the bit offset where our SC lives in the bit stream. [...]
1
4
Another kind thing from our patch to DXC's LLVM is that the value of the constant is replaced by a known constant, fixed value (1000001b) that we can later compare against to verify we're about to patch the right spot. [...]
1
4
(To dodge for now some more complex situations caused by LLVM's VBR, we are constraining our values to 7 bits.) 5. SETTING PROPER SC VALUES This is again something that has to be done only once and be cached. [...]
1
4
We have to do a first full round of tampering the cases of 1000001b with the real default values we want in our SCs. That's easy now we have our SC bit offset database. Later we can tamper again for any SC we want to set a new value to, which implies the next steps again. [...]
1
4
6. DXIL SIGNATURE After any change to the DXIL bit stream, the DXBC container has to be signed again. (Note that the first compilation via DXC would have already signed it if DXIL.dll is in the neighborhood, but that signature is no longer valid.) [...]
1
4
7. PIPELINE CREATION The new patched DXBC container is suitable to be used to ceate a new render pipeline. Something great to know is that the GPU driver performs further optimizations to those done by DXC. [...]
1
5
Therefore, once our DXIL bitcode is rid of volatile variables and is just dealing with constants, the GPU driver is able to make smart decisions. Thanks to the AMD @Radeon dev tools, I've checked that the ISA assembly is indeed optimized depending on the value of the SCs. [...]
1
4
(I guess the same happens on and others.) In the images you can see the diff of the GCN assembly listings of the same exact shader in whose original high level source code was doing or skipping a multiplication based on the value of a SC.
Image
Image
1
5
- Values can't take more than 7 bits (therefore, integers are limited to 127). It's solvable with some extra work against the VBR encoding, though. - Floating point SCs are not supported. It may be possible, but I haven't strived for it. [...]
1
3
- Legit values mathing the sentinel may appear in regular shader code. The base sentinel value has been chosen carefully to keep the chances to a minimum, but it's theoretically possible. [...]
1
4