TFW you are still using AXPY as a representative workload to make your architectural case in 2019.https://twitter.com/fanf/status/1098171169902088192 …
-
-
Replying to @stephentyrone
"this would be much easier in my imaginary instruction set"
1 reply 0 retweets 0 likes -
Replying to @Gok
Also, I 100% guarantee you that the ia32 AXPY they're comparing against is some very naive implementation that walks up all power-of-two sizes for edging instead of using either masked or overlapping stores.
1 reply 0 retweets 1 like -
Replying to @stephentyrone @Gok
(Reads code listing). Oh, no. It's worse.
1 reply 0 retweets 2 likes -
Replying to @stephentyrone @Gok
I especially like how they've chosen a not-unrolled implementation of the core loop that literally cannot run at peak and used scalar edging in order to artificially boost the count of instructions retired.
2 replies 0 retweets 1 like -
Replying to @stephentyrone @Gok
Is there anyone who knows anything over at SIGARCH skimming through this stuff?
2 replies 0 retweets 2 likes -
Replying to @stephentyrone @Gok
Like, there's actually a really good point to be made about this stuff, but this paper kinda catastrophically fails to make it, or to even articulate what that point is.
3 replies 0 retweets 1 like -
Replying to @stephentyrone @Gok
Do you lean toward SIMD or vector architectures? I’m not an expert here, but there is a nice simplicity to the vector code, if nothing else.
2 replies 0 retweets 0 likes -
Thinking of it as either-or is a mistake. Even if you go whole-hog on vector, you want to have "vector-of-SIMD" to support working on vectors of complex, quaternions, interleaved color channels, etc. ARM's SVE is the closest thing to a real proposal that's in public view.
1 reply 1 retweet 2 likes -
Also swept under the rug in Patterson's analysis: most of the ISA explosion in SIMD is integer operations that DO NOT EXIST in scalar form (mul-hi with rounding, etc). You'll still need those in your vector architecture, so you're not actually going to cut down the ISA much.
2 replies 0 retweets 0 likes
Well, to be fair, Intel *does* has a zillion shuffle instructions that could have been combined into just a handful.
-
-
That's an indictment of Intel's long-time unwillingness to just build a damn shuffle, not of SIMD.
0 replies 0 retweets 1 likeThanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.