Conversation

This Tweet was deleted by the Tweet author. Learn more
This Tweet was deleted by the Tweet author. Learn more
This Tweet was deleted by the Tweet author. Learn more
This Tweet was deleted by the Tweet author. Learn more
Replying to
I don't think SMT would be widespread if it wasn't such a great marketing tool. They use it to claim there are 2x the number of cores by talking about "logical cores" which is pretty much nonsense. The performance benefits are overrated and it can often impose a cost instead.
1
This Tweet was deleted by the Tweet author. Learn more
Replying to and
I have a lot of workloads where I get slightly better performance with it, but they could potentially be providing more performance by dropping it from the hardware and using the resources / complexity for something else like a larger L1 cache and TLB.
1
I also have a lot of workloads where I don't get better performance, or where it hurts performance. The threads share an L1 cache and it's not necessarily a good thing to switch between them on stalls and have them pollute the cache used by the other thread. Can hurt latency too.
1
There's also the issue that most applications will detect 2x the number of cores and spawn twice as many threads for worker pools. In jemalloc, it will result in having twice as many arenas, and to an extent that makes sense since the threads would block each other with locking.
1
So for the case where's a single application using most of the resources, it will often have a small benefit, but often no benefits. It usually slightly hurts performance for single-threaded workloads, since a non-workload thread ends up wasting resources and screwing up cache.
1
If you look up gaming benchmarks with hyperthreading, you'll see a lot of benchmarks where it causes a 1-5% performance hit and few with any benefits, because those workloads are rarely bounded by the number of cores, but the OS will still be assigning multiple threads per core.