Conversation

This Tweet was deleted by the Tweet author. Learn more
This Tweet was deleted by the Tweet author. Learn more
This Tweet was deleted by the Tweet author. Learn more
This Tweet was deleted by the Tweet author. Learn more
Replying to
I don't think SMT would be widespread if it wasn't such a great marketing tool. They use it to claim there are 2x the number of cores by talking about "logical cores" which is pretty much nonsense. The performance benefits are overrated and it can often impose a cost instead.
1
This Tweet was deleted by the Tweet author. Learn more
Replying to and
I have a lot of workloads where I get slightly better performance with it, but they could potentially be providing more performance by dropping it from the hardware and using the resources / complexity for something else like a larger L1 cache and TLB.
1
I also have a lot of workloads where I don't get better performance, or where it hurts performance. The threads share an L1 cache and it's not necessarily a good thing to switch between them on stalls and have them pollute the cache used by the other thread. Can hurt latency too.
1
There's also the issue that most applications will detect 2x the number of cores and spawn twice as many threads for worker pools. In jemalloc, it will result in having twice as many arenas, and to an extent that makes sense since the threads would block each other with locking.
1
For an overall system though, it's causing many applications to use substantially more resources (2x as many threads in worker pools, 2x as many arenas / buckets for that approach to fine-grained locking) for very little benefit, because there aren't really 2x as many cores.
1
If you look up gaming benchmarks with hyperthreading, you'll see a lot of benchmarks where it causes a 1-5% performance hit and few with any benefits, because those workloads are rarely bounded by the number of cores, but the OS will still be assigning multiple threads per core.