I was wrong about mitigations making a big difference, at least on newer kernels which use PCID, and other performance mitigations. I get only 15% difference for mitigations=off:https://gist.github.com/travisdowns/be7ab2947d4b28e07ad4421a79ee6d09 …
-
-
-
However, 2MB pages makes a huge difference: 4x to 5x faster, up to ~14 GiB/s. I guess you weren't getting (many) hugepages because none were easily available: try a reboot. Setting transparent hugepages to "always" and doing a regular allocation is not the same as having it ...
- Još 14 drugih odgovora
Novi razgovor -
-
-
It's amazing, but as a bit of pet peeve, what is the single-thread memory throughput? I am not sure about dozens of GBs, I think it's likely more like 4-8GBs, no?
-
4 GB/s for a memory copy… that would be slow, no?
- Još 5 drugih odgovora
Novi razgovor -
-
-
Probably depends on the underlying allocator in use? Because 1-2 GB/s seems too low in comparison to TLAB bump-ptr alloc + zeroing in Java world: https://shipilev.net/jvm/anatomy-quarks/7-initialization-costs/ …
- Još 1 odgovor
Novi razgovor -
-
-
I investigated this a few years ago, on Windows. In addition to the cost of faulting in the pages there is, eventually, the cost of removing the pages from your process. And on Windows the cost of zeroing the pages (between free and alloc) is also hidden:https://randomascii.wordpress.com/2014/12/10/hidden-costs-of-memory-allocation/ …
-
That is, on Windows when you run "new char[s]();" you may still not be measuring the cost to zero the pages (done by the system process) and you're definitely skipping the surprisingly expensive free costs! Transparent huge pages sound great.
- Još 1 odgovor
Novi razgovor -
-
-
Fresh memory allocation like this is limited by the speed of faulting in each 4k page and since Spectre and Meltdown this has gotten a lot worse (because the cost of each fault has increased significantly). The fastest way for 4k pages, when you know you are going to use it ...
-
is mmap() with MAP_POPULATE, since that tells the OS to go ahead and fault in all the pages now, avoiding any later faults, and I get to about 10 GB/s on Linux with this approach.
- Još 2 druga odgovora
Novi razgovor -
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.
