Why using the Vector<int> only improves the performance by 2x compared to the original implementation? shouldn't it make it 4-8x faster by using intrinsics anyway?
-
-
-
SIMD doesn't always scale exactly as expected. Generally the 4-8x faster is the theoretical maximum and various other operations can slow it down. In this case, its likely due to bounds checks the JIT isn't able to elide.
- Još 2 druga odgovora
Novi razgovor -
-
-
In the source code https://gist.githubusercontent.com/tannergooding/ed5783418a857c1cbeb95b5b0f95754e/raw/71a662a6f1b5939c5cd8806cc111ac5d3d7464ae/SumVectorized.cs … the first p in public is missing
-
Thanks for pointing this out. I fixed this (and another issue reported where I was checked Sse.IsSupported rather than Sse2.IsSupported) in the gist and it should be reflected on the blog now.
Kraj razgovora
Novi razgovor -
-
-
Neat! Probably could have used that back in 2009 when I had a use case for BSR (bit scan reverse), which I think can be calculated based off of Lzcnt: https://www.codeproject.com/Articles/43103/SlimList …
-
For some really core operations, like Lzcnt, Popcnt, and RotateLeft/Right; we now have the general-purpose `System.Numerics.BitOperations` class. This class is powered by the intrinsics when available and otherwise has a generally efficient software implementation for fallback
- Još 1 odgovor
Novi razgovor -
-
-
That’s a cracking blogpost!
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
-
-
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.