@rygorous Isn't that exactly what I did on the stream?
-
-
-
Replying to @rygorous
@cmuratori okay, so another 10 minutes later you switch tacks, throw away the stuff you spent the first 35 minutes on and do it differently?2 replies 0 retweets 0 likes -
-
Replying to @rygorous
@cmuratori _mm_storeu_si128 = unaligned write. Default writes for SSE regs are aligned.1 reply 0 retweets 1 like -
Replying to @rygorous
@cmuratori Both debug and release. You just got lucky in debug.1 reply 0 retweets 0 likes -
Replying to @rygorous
@cmuratori And it compiles into a MOVDQU not a MOV. You were looking at the wrong line of code the second time. :)1 reply 0 retweets 0 likes -
Replying to @rygorous
@cmuratori Conversion: you actually want cvtps (not cvttps) since the original code had the +0.5f: cvtps defaults to round to nearest.1 reply 0 retweets 0 likes -
Replying to @rygorous
@cmuratori cvtps uses the SSE rounding mode set in MXCSR, but round to nearest, break ties to even is the default mode.1 reply 0 retweets 0 likes -
Replying to @rygorous
@cmuratori And unlike the x87 rounding mode, you do not generally have people messing with it. (SSE already had separate convert-truncate.)2 replies 0 retweets 0 likes
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.