Tweetovi
- Tweetovi, trenutna stranica.
- Tweetovi i odgovori
- Medijski sadržaj
Blokirali ste korisnika/cu @zwegner
Jeste li sigurni da želite vidjeti te tweetove? Time nećete deblokirati korisnika/cu @zwegner
-
Super Fun Unpaid Overtime Partyhttps://twitter.com/elonmusk/status/1224087317364854785 …
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Pushed a bunch of updates to x86-info-term, including data from http://uops.info https://github.com/zwegner/x86-info-term …pic.twitter.com/AWv6KqXICD
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
This is up now: https://github.com/zwegner/x86-info-term … (and in retrospect this comes off mildly click-baity, I will avoid hype-tweeting like this in the future)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Here we go, an intrinsics guide for the terminal: https://github.com/zwegner/x86-info-term … There's some rough edges (it's all just hacky curses calls), but it's pretty sexy looking so far, if I do say so myselfpic.twitter.com/jk1LSJXcRD
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
stretch goal: use the pseudocode parser from x86-sat to provide syntax highlighting Am I serious? Stay tuned...
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Started yet another random side project last night: a terminal version of the Intel Intrinsics Guide. Soon I'll integrate timing info from http://uops.info . ...would anybody use this if I open-sourced it?pic.twitter.com/FSegnn0Ql5
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Also spent a bit too long yesterday thinking of how to reduce the critical path by trying to apply a parallel prefix carry-lookahead adder to the parallel-prefix-popcount, which makes my head hurt (but would get 2 more P's in the acronym). I'm still not sure if it's possible
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Shaved a handful of cycles off the zp7 PEXT/PDEP polyfill by noticing the last CLMUL by -2 wouldn't carry anyways, and thus can just be -x<<1. Code's up on GH now
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Cheers to
@pshufb for asking really cool questions that get me to waste a bunch of time :) I don't even own any AMD chips...Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
I gave up on doing an x86-sat proof of this code, since Intel's PEXT/PDEP pseudocode requires writing to different result bits depending on input data, which is quite hard to transform to Z3. Instead there's a test with a bunch of random inputs
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Finished up my branchless PEXT/PDEP polyfill for AMD chips featuring some gnarly bit twiddling and an acronym with 7 P's. Behold:https://github.com/zwegner/zp7
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Huh, all this time I had naively thought rdtsc was useful for measuring cycles but not actual time because of turbo boost. Turns out its the opposite, rdtsc always counts at the base freq. So the counter apparently crosses a clock domain? That's pretty weird
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
The upcoming VP2INTERSECT AVX-512 instructions are pretty cool, doing 32/64-bit pairwise equality of two vectors. AFAIK they're the first x86 instructions with indexed register file access, writing to two consecutive mask registers (always an even/odd pair)
Hvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
...perhaps I should note that this is detecting live locks of internal uarch state, not user code. IOW "we can't verify that the arch always makes forward progress, sometimes it doesn't". That's rather scary IMO, and I can't imagine it's better with ~20y of added complexity
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
TIL that Intel put a live lock detector into at least P6/P4 to flush all speculative state if no uops had retired in X cycles. I wonder if that's still around... (from http://newsletter.sigmicro.org/sigmicro-oral-history-transcripts/Bob-Colwell-Transcript.pdf …)pic.twitter.com/jd22EtRuey
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
First non-trivial result is in the code: finding popcount(x & 0x01010101) == (x & 0x01010101) * 0x01010101 >> 24. (the catch is that it's only searching through combinations of mul/and/shr, and only 2*32 bits worth of constants)
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
Another small project: I made a toy SIMD superoptimizer in ~100 lines of Python using x86-sat. It's not particularly smart about managing the search space, but it's found a couple cool things so far.https://github.com/zwegner/x86-sat/blob/master/optimize.py …
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
...this also helped me find a bug in
@Intel's Intrisics guide (described in the README). Possibly of interest to@geofflangdale@trav_downs@lemire@jkeiser2Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi -
My latest little 1-day project: generating a Z3 SAT model of x86 intrinsics from Intel's documentation. This can help prove SIMD code correct, find lookup table values, etc.https://github.com/zwegner/x86-sat
Prikaži ovu nitHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.