In practice, this means (for example) that when you are memory bandwidth limited, single-core Meow hash would have topped out on an i7-7700k at 5.5 bytes/cycle - the single-core bandwidth. Threaded, the new Meow hashes at the full 7 bytes/cycle system memory bandwidth allows.
-
-
Show this thread
-
And the construction still runs at 16 bytes/cycle in cache, or even faster now if you can keep your multiple threads fed with data already in cache!
Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.