Is there a doc with all of this? I actually don't even know how code tells a branch predictor "this is the likely branch", although I know that gcc had a 'likely' builtin iirc.
I see you mentioned a builtin, I'll check that out too.
Conversation
Replying to
Look at llvm.org/docs/BranchWei.
__builtin_expect is the traditional GNU C built-in. It's traditionally used this way:
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
__builtin_expect_with_probability was introduced by Clang.
2
Replying to
Thanks. I'm not sure I understand what you mean by ordering of branches though?
2
Oh, as in if you have an "if/else" where do you put the code for the "if" vs the "else" relative to the branch?
1
Replying to
Yes, since it turns into a branch instruction where one path falls through and the other jumps somewhere. If it jumps up to a block above, it looks like a loop. If it jumps down, that looks like a jump to a cold path so the fall through looks hotter. It's just implicit hints.
1
1
There are usually not actually explicit branch predictor hints but there is a lot of implicit information and the compiler knows how to align, order and group things to help the branch predictor, etc. Some of it depends on CPU generation so targeting specific CPU can help a bit.
2
Also, it can decide when to use a condition move (CMOV) instead of a branch, etc. Sometimes it might make sense to generate code with a branch or without a branch depending on probability. And of course, inlining, ordering, grouping, how much it optimizes for perf vs. size.
1
It can ramp up unrolling, vectorization, inlining, etc. for hot cold and not only avoid it but heavily optimize cold code for size, etc. LLVM barely does some of these things, but it could, and is getting better. It's really not as smart as you would probably expect it to be.
1
In practice, there are only about a dozen places where this kind of metadata is taken into account but they're very important places. The main thing that makes it more disappointing than it should be is a lot of optimization passes are bad at preserving this kind of metadata.
1
Clang lowers __builtin_expect / __builtin_expect_with_probability in the frontend into branch weight metadata so it needs to be right on an if statement or && / || with only stuff the frontend evaluates away. Clang doesn't do any inlining, etc. in the frontend. Look at LLVM IR.
1
Use -S -emit-llvm for clang to output LLVM IR in textual format without optimization and with optimization and you can figure out how it works. Bear in mind a lot of the final optimization happens as part of arch specific code generation, but most of it happens in the IR layer.

