[1/*] I wanted to post a brief illustration of how hard it is to use CLANG when you're trying to write anything that needs to be carefully optimized. I am not cherry-picking this - this happens to me literally all the time with CLANG.
-
-
[4/*] The first version of CLANG that seems to compile AVX2 code is 3.1. It generates the correct ASM, which as you can see, is basically a verbatim translation. It has moved up the or's to the earliest place they could issue, which is probably not necessary, but might help.pic.twitter.com/TjR7CiDxj1
Show this thread -
[5/*] The codegen remains sane through to 3.9.1. It removes the unnecessary stack push/pop, no longer moves up the or, and puts the cmp's in the same order as the input (3.1 rearranged them). But as far as I know, this will execute basically the same.pic.twitter.com/Hutmzu4J37
Show this thread -
[6/*] In 4.0.0, everything goes haywire. Since I've never looked at CLANG's optimizer, I won't pretend to know why it decides to try to split the 256-bit regs into 128-bit lanes. I assume it's just a bug, or something? But this code is terrible:pic.twitter.com/oaMbBCz329
Show this thread -
[7/*] This continues for three years, up through 10.0.1. There are slight variations but the code remains mostly the same.pic.twitter.com/7hYeuSXtSP
Show this thread -
[8/*] Finally, in 11.0.0, the correct codegen is restored.pic.twitter.com/ruyLZIBVA8
Show this thread -
[9/*] This is what makes it so hard to work with CLANG for optimized code :( You never have any idea when it is going to critically break. You may carefully inspect some part of your code, and verify that it compiles to sane ASM... but it can change radically from a version bump.
Show this thread -
[10/*] And furthermore, since the optimizer changes so dramatically from release to release, they may fix something broken (resulting in a speed win) and break something else (speed loss), so you can't even just use benchmarks to know things didn't regress.
Show this thread -
[11/*] You can use a profile to know the _total_ speed of your program didn't get slower, but several parts may have gotten slower due to CLANG optimizer freakouts, and as long as they improved other parts as much (or more!), you won't even know until you reanalyze manually.
Show this thread -
[12/*] I am not an optimization person. I maybe spend a week every three months looking at ASM. This has happened to me _three times in the last year_. It's almost 100% of the time I've looked at CLANG ASM, I have hit something like this.
Show this thread -
[13/*] I'm not sure what the solution to this problem is. But I do think it would be a nice start for people to recognize that it _is_ a problem, because I think it really is.
Show this thread -
[14/*] As an example of a possible direction to go, perhaps we could have a new directive to the compiler that you could bracket pieces of code with that you have crafted carefully and analyzed.
Show this thread -
[15/*] This directive could do some things, like limit reordering, try to translate intrinsics as directly as possible, etc., with an emphasis on _predictability_ rather than speed.
Show this thread -
[16/*] This would allow programmers to have some confidence that when they've worked out the best way to do something, they can lock that in and know it won't get undone by crazy optimizer steps.
Show this thread -
[17/*] The only alternative to something like this would be to write these parts in inline assembler, which honestly, would be fine with me - except the syntax for inline assembler is so horrid that it makes it very unpleasant to use.
Show this thread -
[18/*] So another option would be to get serious about inline assembler, and make a pleasant-to-use, well specific ASM syntax that can be placed in C. I'd be fine with either.
Show this thread -
[19/*] What I'm increasingly less fine with is constantly having to struggle with CLANG to get it to _stop_ doing crazy things to routines that were already basically optimal if it just translated them directly. It's very frustrating, and sometimes literally can't be fixed :(
Show this thread -
[20/*] That's all. I figured this has happened enough times now that I should post something about it.
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.