I count 380 128 bit instructions to perform the scrambling, and then there is a bit on top of that to make the whole thing work. Using 256 bit instructions you'd perform two blocks in parallel, yielding 128 bytes of stream. So 1.2 cycles per byte comes out to 2.5 instructions ...
-
-
-
... per clock, plus some for moving stuff around. Seems pretty reasonable to me.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.