I agree there are stronger & cheaper mitigations, but I’m a little surprised that a few extra pushes, pops, & branches add up to 2-5%. Some people on Twitter claim correctly predicted branches are free (which is dubious). There are lots of cpus without arm pac or intel cet around
Conversation
The cost is primarily reading and writing the canary on the stack. It also reads the expected value from global or thread local storage so there are 2 reads and a write, and then a branch based on the 2 reads that's set up so prediction is trivial but still wastes CPU resources.
1
It's a lot of overhead to add to every function that's using an array or taking address of a local (determined after IR optimizations).
ShadowCallStack is a replacement for return address on the stack. It still adds a write because they leave the return address for unwinding.
1
ShadowCallStack has a broader threat model protecting return addresses in general by having them in an isolated memory region which can be given a separate random base that's far harder to leak. It's cheaper than SSP for a given function but strong SSP gets to skip many of them.
1
1
ShadowCallStack's only redundancy is still writing out the standard return addresses. No extra branches, no extra reads and only writing out that standard return address while never actually reading the value unless there's unwinding where performance doesn't matter at all.
1
Hardware shadow stack support like CET is ideal and I think it could easily perform better than not having structured function calls. It probably wouldn't in practice unless they designed the architecture to do function calls that way from the beginning and then built around it.
1
I have a Zen 3 CPU (Ryzen 9 5950X) with CET shadow stack support although I don't think it has CET coarse grained CFI support. I don't think I can measure anything because userspace CET SS support for Linux isn't upstream and I doubt Zen 3 has an optimized implementation anyway.
1
Not really interested in trying experimental userspace CET SS patches and then figuring out if the support included in GCC/Clang and glibc even works properly.
I know how Clang/GCC SSP, Clang CFI and Clang SCS perform on various CPUs but no clue about CET or ARM PAC/BTI/MTE.
1
Wild to me that linux doesn’t have proper support for cet yet. Windows had it when those cpus launched.
1
1
Intel had it written a long time ago but they have to appease a bunch of people who are close to having absolute veto power over their parts of the kernel unless Linus decides that something is important needs to land which mostly isn't going to happen for security features.
They have to go through dozens of iterations of the patches primarily to try to appease all kinds of subjective and often questionable / conflicting demands. Stuff tends to end up being overall higher quality by the end but sometimes things end up worse for political reasons.
1
It's pretty common for features to take ages to end up in the upstream kernel. Pixel phones use Clang CFI and SCS in the kernel for the Pixel 3 and later. Neither of those features is available in mainline Linux yet. Pixel 3 launched in November 2018 with CFI (got SCS later).
1
1
Show replies

