[14/*] As an example of a possible direction to go, perhaps we could have a new directive to the compiler that you could bracket pieces of code with that you have crafted carefully and analyzed.
But to answer your original question, splitting 256's up into 128's would only be a win on slow-256 CPUs _if you actually did that_. The CLANG code here in 4-10 doesn't. It still uses the ymm registers, so it doesn't actually avoid the 256-bit path. It's just a plain old bug.
-
-
(As an aside, I also doubt this would be something you'd do for AMD CPUs. It would more be something to do for the tranche of Intel CPUs that drastically downclocked the code when ymm's were used. But regardless, CLANG didn't accomplish that in the buggy codegen.)
-
You are right, this does not appear to be related to any optimisation trade-off to prevent slow-256 code paths, rather just an optimizer bug. Personally I would cut the Clang people some slack though, it’s a massively complex thing that works extremely well most of the time ;-)
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.