Yes, I was referring specifically to the knowledge behind the reply from AMD
-
-
there's been a fair bit of FUD around benchmarks of KPTI with absolutely-useless-out-of-context numbers like "30% penalty" thrown around; does anyone know how much actual latency it generally adds to each syscall? Presumably best measured in cycles?
1 reply 0 retweets 1 like -
Even when measuring raw syscall latency I think you'd need to define a workload for that. Part of the overhead is going to be around TLB management & hit ratio, which'll be heavily dependent on workload.
1 reply 0 retweets 0 likes -
Replying to @AndresFreundPol @11rcombs and
FWIW, for a memory resident OTLP readonly postgres benchmark with 16 connections, on my i7-6820HQ the performance regression is a bit above 7%.
1 reply 0 retweets 0 likes -
yeah, I don't doubt it can be significant, but a lot of discussion on the topic I've seen has been "regression of X%" without specifying what workload that's with at all so, what defines the overhead of the TLB swap? Number of memory maps? (getting way out of my depth here)
1 reply 0 retweets 0 likes -
The 30%+ numbers I've seen mostly referred to
@grsecurity's benchmark of "du -s" - which is obviously extremely syscall heavy. Not quite the worst case, but pretty close. That seems like a valid thing to measure.1 reply 1 retweet 1 like -
oh yeah, definitely a useful number to have, but harmful to toss around without context
1 reply 0 retweets 0 likes -
I agree, that's why I provided the context in my tweets, which of course got ignored by regurgibloids. I should mention though that prior to my benchmarks, the only numbers were overly optimistic ones from upstream and a 0.28% claim from the KAISER paper: https://gruss.cc/files/kaiser.pdf …
1 reply 1 retweet 2 likes -
Replying to @grsecurity @11rcombs and
If you agree 30% without context is harmful, then should also agree the blanket claim of 0.28% (and various incarnations of < 1% that were thrown about as marketing) were also harmful. This is why I suggested there should be a wide-scale retesting of benchmarks from past years
2 replies 1 retweet 5 likes
Doesn't count as wide-scale, even for postgres, but here are numbers for two workloads with pti=off, pti=on, pti=on & nopcid: https://www.postgresql.org/message-id/20180102222354.qikjmf7dvnjgbkxe@alap3.anarazel.de …
-
-
Replying to @AndresFreundPol @grsecurity and
your baseline should be a tree that doesn't have even the preparatory patches for PTI, say 4.14.8/4.15-rc3(?) or earlier. also make sure your benchmark pegs the cores at 100% instead of waiting for I/O.
0 replies 0 retweets 1 likeThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.