Majority got it wrong. They actually optimize the less-optimizable version more.https://twitter.com/RichFelker/status/1028357639347036160 …
-
Show this thread
-
Particularly in the case where K is a multiple of 32.
1 reply 0 retweets 0 likesShow this thread -
Replying to @RichFelker
Does it depend on the architecture? x86 will implicitly do the mask, iirc. I.e., only 5 bits of the shift amount will be used. I _think_ arm will use more bits.
1 reply 0 retweets 0 likes -
Replying to @stevecheckoway
Not sure. But in the first case it doesn't matter because it's UB. There's no reason to actually sub if K is a multiple of 32.
1 reply 0 retweets 0 likes
Replying to @RichFelker @stevecheckoway
Whereas in the second version, omitting the sub and mask depends on the x86 behavior. (Actually the sub can be optimized out unconditionally if K%32==0; maybe that's what gcc and clang are doing.)
2:00 PM - 12 Aug 2018
0 replies
0 retweets
0 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.