[12/*] I am not an optimization person. I maybe spend a week every three months looking at ASM. This has happened to me _three times in the last year_. It's almost 100% of the time I've looked at CLANG ASM, I have hit something like this.
Discussing the performance of this is a bit difficult on Twitter, but: the point here is not to talk about the performance of it. Regardless of which version you think is better, you either thing CLANG 4 through 10 were producing a bad version, or you think Clang <4 >10 were.
-
-
So if you think CLANG was doing something _right_ in 4 through 10, then you think it has a codegen bug _now_ :)
-
But to answer your original question, splitting 256's up into 128's would only be a win on slow-256 CPUs _if you actually did that_. The CLANG code here in 4-10 doesn't. It still uses the ymm registers, so it doesn't actually avoid the 256-bit path. It's just a plain old bug.
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.