[13/*] I'm not sure what the solution to this problem is. But I do think it would be a nice start for people to recognize that it _is_ a problem, because I think it really is.
So if you think CLANG was doing something _right_ in 4 through 10, then you think it has a codegen bug _now_ :)
-
-
But to answer your original question, splitting 256's up into 128's would only be a win on slow-256 CPUs _if you actually did that_. The CLANG code here in 4-10 doesn't. It still uses the ymm registers, so it doesn't actually avoid the 256-bit path. It's just a plain old bug.
-
(As an aside, I also doubt this would be something you'd do for AMD CPUs. It would more be something to do for the tranche of Intel CPUs that drastically downclocked the code when ymm's were used. But regardless, CLANG didn't accomplish that in the buggy codegen.)
- Show replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.