TIL in "gcc -Os sucks": while -Os insists on doing div instead of mul for division by a constant to save a few bytes, it still happily inlines huge struct assignments as a sequence of sse mov's rather than a single "rep movsq"... 🤦🤦🤦
Conversation
Replying to
It’s been a while since I did anything at this level but what I remember is that modern x64 architectures really disfavor rep movsx and it’s quite slow. So sse moves are probably quite a bit faster albeit larger code
2
It's one of the fastest ways to do large copies since Ivy Bridge. It's still not good for small copies since it has a substantial start-up time. AVX has competitive performance at the expense of slowing down everything else due to the AVX offset as it's using way more hardware.


