[1/*] I wanted to post a brief illustration of how hard it is to use CLANG when you're trying to write anything that needs to be carefully optimized. I am not cherry-picking this - this happens to me literally all the time with CLANG.
-
Show this thread
-
[2/*] And just to underscore that, before I start, this case arose organically while I was typing in a fragment to godbolt to check a _different_ CLANG optimizer bug :(
1 reply 0 retweets 16 likesShow this thread -
[3/*] Here is an extremely simple function that does almost nothing. The intrinsics in the function are essentially the ASM - you could translate them verbatim and get the "optimal" ASM output for all intents and purposes. It doesn't really need to be "optimized" at all.pic.twitter.com/Po5yFZ8eR7
2 replies 1 retweet 17 likesShow this thread -
[4/*] The first version of CLANG that seems to compile AVX2 code is 3.1. It generates the correct ASM, which as you can see, is basically a verbatim translation. It has moved up the or's to the earliest place they could issue, which is probably not necessary, but might help.pic.twitter.com/TjR7CiDxj1
1 reply 0 retweets 15 likesShow this thread -
[5/*] The codegen remains sane through to 3.9.1. It removes the unnecessary stack push/pop, no longer moves up the or, and puts the cmp's in the same order as the input (3.1 rearranged them). But as far as I know, this will execute basically the same.pic.twitter.com/Hutmzu4J37
1 reply 0 retweets 13 likesShow this thread -
[6/*] In 4.0.0, everything goes haywire. Since I've never looked at CLANG's optimizer, I won't pretend to know why it decides to try to split the 256-bit regs into 128-bit lanes. I assume it's just a bug, or something? But this code is terrible:pic.twitter.com/oaMbBCz329
2 replies 0 retweets 15 likesShow this thread -
Replying to @cmuratori
Crazy! Not that it should matter, but does making B a const pointer change anything in the compiler output of the faulty clang?
1 reply 0 retweets 0 likes
Nope. My assumption is that the lane splitting probably happens way after const has already been forgotten (it's almost a non-issue in optimizing C++ - I think it is only a front-end thing?)
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.