i.e. the relevant part is which opcode it is, not what the register is (you can use the r32/r64 version with RAX) variant 1, "adc eax, immed" (or the rax version) is {REX.W} 0x15 <imm32> variant 2, "adc r32/r64, immed" (reg can be eax/rax) is {REX.W} 0x81 <ModRM /2> <imm32>
-
-
I don't know why the accumulator variant has an extra uop but it's always important to remember with x86 that there's a bunch of 1-byte instr variants that only work with the accumulator and look similar in a disasm listing but are completely separate opcodes
1 reply 0 retweets 0 likes -
I wish AMD64 had thrown out way more of the old 1-byte encodings; all the accumulator ops are super rarely useful, have these weird potholes, and take up really valuable real estate in the opcode map that could've been then used for other things.
1 reply 0 retweets 3 likes -
a whole, connected, 1/16th of the entire 1-byte opcode map (bit pattern 00***10*) is devoted to these things and they're most definitely not worth it
1 reply 0 retweets 2 likes -
Imagine if they'd used that space for, say (random example, didn't look at real-world code to eval how good that particular proposal is) a "src1" prefix that lets you specify a 4-bit register number for the first source (independent of destination) on any GPR instruction
1 reply 0 retweets 0 likes -
That would replace a sizable fraction of all reg-reg moves in x64 code (generally either 2B or 3B) with a 1B prefix, which would be much more generally useful than shorter accumulator ops.
1 reply 0 retweets 1 like -
That particular example is literally just the first thing that came to mind; my point is just that having that amount (and quality) of space freed up is enough for substantial re-engineering, so it's a shame that AMD64 didn't do it. :/
1 reply 0 retweets 1 like -
(It was the one legitimate chance to remove anything from future x86 that anyone ever got.)
1 reply 0 retweets 2 likes -
Replying to @rygorous @Jonathan_Blow
One assumes it would have been best to take the x86/SSE instruction set and "rebalance the huffman tree" such that you tried to have the size of each encoding correspond to its frequency? But I assume they didn't do this because they still wanted to run x86 at speed...
1 reply 0 retweets 0 likes -
Replying to @cmuratori @Jonathan_Blow
They wanted to not have a separate decoder, basically, and x86_64 is definitely close enough to do 32b and 64b with the same decoder block (and many internal flag bits).
2 replies 0 retweets 1 like
Yes. Which is a shame, because then that means we are paying for it for the rest of time, basically :/
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.