in cryptography we assume some things about the hardware, usually including that it uses two's complement and has a constant-time hardware multiply instruction
-
Show this thread
-
so for example crypto on older macs with powerpc chips is "not possible" (not without a lot of effort) because the chip's multiplication instruction first looks to see if either multiplicand is 0 or 1, in which case it short circuits and returns 0 or the other multiplicand rsp
2 replies 1 retweet 13 likesShow this thread -
similarly also for ARM cortex-M3 chips, the multiply instruction can take 1-2 cycles less if both multiplicands are ≤ 2^16, either multiplicand is 0, or—somewhat strangely—either multiplicand is a power of two
1 reply 0 retweets 9 likesShow this thread -
of course there's ways around non-constant-time multiply instructions, like the well-documented tricks
@BearSSLNews uses (cf. https://www.bearssl.org/ctmul.html or below) but afaik all them rely on tricking *some* form of a hardware multiply instruction into good behaviourpic.twitter.com/wDV8N6nUSU
1 reply 0 retweets 14 likesShow this thread -
in my quest to make commodore 64s secure against attackers with quantum computers by implementing supersingular isogeny key encapsulation in 6510 assembly, i obviously need constant-time multiplication, but forget even variable-time IT DOES'T HAVE *ANY* MULTIPLICATION INSTRUCTION
2 replies 4 retweets 38 likesShow this thread -
before jumping into the assembly (THERE WAS A JOKE THERE, DID YOU SEE, DID YOU SEE IT) maybe i should first show some C taken from an older version of BoringSSL which multiplies two n-bit numbers into a 2n-bit result in constant-time (albeit relying on hardware multiplication)pic.twitter.com/VwLCU6cDrC
1 reply 0 retweets 27 likesShow this thread -
here's a fairly "simple" variable-time 8-bit x 8-bit -> 16-bit multiplication algorithm in 6502/6510 assembly, which indexes over the bits of the b multiplicand and conditionally either doubles or add-then-doubles, taking 146 cycles (best case) to 184 cycles (worst case)pic.twitter.com/Nh3hoONQRF
4 replies 1 retweet 14 likesShow this thread -
here's the same routine made constant time by always adding-then-doubling which requires 283 instructions AND TAKES 374 CYCLES JUST TO MULTIPLY TWO BYTESpic.twitter.com/TBV7N5mlUh
1 reply 0 retweets 19 likesShow this thread -
the 6510 chips in commodore 64s run at ~1MHz depending on whether it's the PAL or NTSC version, and a field element in this 434-bit prime field takes 56 bytes, so multiplying two field elements takes roughly 20,944 cycles or ~21ms assuming page boundaries aren't crossed
2 replies 1 retweet 11 likesShow this thread -
i'm not sure how many field element operations i'm going to need to walk the isogeny graph yet, but i feel pretty confident that this is going to be the slowest post-quantum cryptographic implementation in existence, and quite possibly just straight up slowest crypto in the world
1 reply 2 retweets 21 likesShow this thread
this amusingly means that i will hold the title for implementing both the fastest and slowest elliptic curve related cryptographic implementations in the world 

-
-
Replying to @isislovecruft
I'd probably go for multiplying 4 bits at a time with lookup tables.
1 reply 0 retweets 0 likes -
Replying to @PhongKoans
i'm aware of various lookup table techniques but i'm worried that the tables will take up too much memory (i'm already at ~4KB for the field implementation and still need to implement the Montgomery curve and isogenies)
0 replies 0 retweets 1 like
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
𝖍𝖆𝖘𝖍 𝖋𝖚𝖓𝖈𝖙𝖎𝖔𝖓𝖘