Why is this code insanely slow when not statically linked? https://gist.github.com/9556fa66886c087b8155d36ff9394109 …
-
-
Replying to @jedisct1
The question is why it is not insanely slow on both. You're mixing VEX-encoded instructions with non-VEX, huge perf penalty.
3 replies 0 retweets 1 like -
And therein lies the answer: with dynamic linking, upper AVX registers are not 0, whereas with static they are.
2 replies 0 retweets 1 like -
Add "vzeroupper" before the loop, and timings become the same.
1 reply 0 retweets 0 likes -
Replying to @sevenps
Indeed, vzeroupper or vzeroall fixes it! Wow. Good to know. Thanks a lot, Samuel. You rock.
1 reply 0 retweets 0 likes -
Replying to @jedisct1
If this is real code, you should really use "vmovdqu" there, avoids having to worry about VEX state transitions.
1 reply 0 retweets 0 likes
Replying to @sevenps
This is from the sandy2x montgomery ladder implementation.
2:18 PM - 16 May 2016
0 replies
0 retweets
0 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.