@krismicinski, Intel can JIT register allocate stack values?!
-
-
Odgovor korisnicima @wilbowma @krismicinski
something like that, yes. post your code and ask one of the heavy hitters around here like
@rygorous or@stephentyrone0 proslijeđenih tweetova 3 korisnika označavaju da im se sviđa -
Odgovor korisnicima @johnregehr @wilbowma i sljedećem broju korisnika:
You get two loads and one store per cycle these days. (And still have issue slots to spare for other work.) It depends on what you're doing and how you're measuring, but as long as you hit the L1 cache, loads and stores are by no means slow.
0 proslijeđenih tweetova 13 korisnika označava da im se sviđa -
Odgovor korisnicima @rygorous @johnregehr i sljedećem broju korisnika:
Is there extra latency associated with store-to-load forwarding or is that pretty much the same as if a register had been used (assuming the adress computation does not itself stall)?
1 reply 0 proslijeđenih tweetova 1 korisnik označava da mu se sviđa -
Odgovor korisnicima @capitalist_void @johnregehr i sljedećem broju korisnika:
store to load forwarding is somewhat slower than a regular L1 hit, yes. Assuming you actually get forwarding, ballpark 50% slower. (~6 instead of ~4 cycles latency). If you do things that break forwarding, the penalties can get way bigger, because that requires a partial flush.
0 proslijeđenih tweetova 8 korisnika označava da im se sviđa -
Odgovor korisnicima @rygorous @capitalist_void i sljedećem broju korisnika:
But especially x86 has been forced to get really good at loads/stores and store forwarding because 32-bit x86 code is still common and quite register-starved. It takes some effort to make the various tricks not work. :)
1 reply 0 proslijeđenih tweetova 9 korisnika označava da im se sviđa -
Odgovor korisnicima @rygorous @capitalist_void i sljedećem broju korisnika:
Anyway, back to what John said: actually, no memory renaming in any current x86 (that I know of, anyway). Loads/stores do always execute as such. It's just that current machines are really quite good at both. (And have quite the capacity for them!)
0 proslijeđenih tweetova 8 korisnika označava da im se sviđa -
Odgovor korisnicima @rygorous @capitalist_void i sljedećem broju korisnika:
wait!!! I thought some smalish number of stack slots really did get effectively registered, is this not that case?
0 proslijeđenih tweetova 0 korisnika označava da im se sviđa -
Odgovor korisnicima @johnregehr @capitalist_void i sljedećem broju korisnika:
Nope! The only stack optimization there is is some frontend trickery to speed up PUSH/POP/CALL/RET by virtualizing the stack pointer.
0 proslijeđenih tweetova 6 korisnika označava da im se sviđa -
Odgovor korisnicima @rygorous @johnregehr i sljedećem broju korisnika:
The net effect is that the loads/stores implicit in these instructions stay, but the stack pointer increments/decrements get collected and batched.
0 proslijeđenih tweetova 1 korisnik označava da mu se sviđa
so if you do say push rax push rbx push rcx that internally gets translated to mov [virtual_rsp - 8], rax mov [virtual_rsp - 16], rbx mov [virtual_rsp - 24], rcx ; remember that the virtual RSP has moved by 24 bytes
-
-
Odgovor korisnicima @rygorous @johnregehr i sljedećem broju korisnika:
and at some point in the future an actual stack pointer update will get injected into the instruction stream - namely, when one of these happens: 1. virtual offset gets too large 2. interrupt/exception 3. before code that actually looks at/modifies rsp directly
0 replies 0 proslijeđenih tweetova 5 korisnika označava da im se sviđaHvala. Twitter će to iskoristiti za poboljšanje vaše vremenske crte. PoništiPoništi
-
Čini se da učitavanje traje već neko vrijeme.
Twitter je možda preopterećen ili ima kratkotrajnih poteškoća u radu. Pokušajte ponovno ili potražite dodatne informacije u odjeljku Status Twittera.