Majority got it wrong. They actually optimize the less-optimizable version more.https://twitter.com/RichFelker/status/1028357639347036160 …
-
-
Does it depend on the architecture? x86 will implicitly do the mask, iirc. I.e., only 5 bits of the shift amount will be used. I _think_ arm will use more bits.
-
Not sure. But in the first case it doesn't matter because it's UB. There's no reason to actually sub if K is a multiple of 32.
-
Whereas in the second version, omitting the sub and mask depends on the x86 behavior. (Actually the sub can be optimized out unconditionally if K%32==0; maybe that's what gcc and clang are doing.)
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.