O.k., tweet-thread time! We have to to talk about constant-time systems! You might think that's all about cryptography and security, but actually this is about availability! Bonus content: Maxwell's Daemon. It's good stuff! NO REFUNDS.
-
-
O(1) is often also short-hand for constant-work; where we spend the same amount of energy or whatever, no matter the inputs/outputs.
Show this thread -
Being O(1) often matters for cryptography and security. For example if you check a password sequentially ... for (int i = 0; i < pwlen; i++) { if (input[i] != pw[i]) { return FAILURE; } } return SUCCESS; that's not O(1) ... and it's bad.
Show this thread -
it's bad because it takes longer to run if the password nearly matches, and attackers can measure that, and this is the classic example of a side-channel, and byte by byte they can totally crack your password like this just by measuring time.
Show this thread -
The fix is to make it O(1) with awful bitwise operations. It's ugly and mean and hard to follow, but something like: difference = 0; for (int i = 0; i < pwlen; i++) { difference |= pw[i] ^ input[I]; } return difference == 0; don't worry if you don't understand this.
Show this thread -
Anyway, so Crypto people have been studying this for a long time. Our Automated Reasoning Group built a tool to analyze code and tell you if it's O(1) ... or how close it is ...https://github.com/awslabs/s2n/tree/master/tests/sidetrail …
Show this thread -
But it turns out that O(1) patterns matter for distributed systems and availability EVEN MORE! Completely unrelated fields really, it's surprising, so let's dig in ...
Show this thread -
O.k. so imagine a very simple control plane. Let's say you have a bunch of servers and they do things for customers and their users. Customer gets some knobs and dials, a configuration they can edit. This is lots and lots of web services.
Show this thread -
So customer makes a change. What do we do? Well it goes to an API, and that API produces a diff or a delta or whatever you want to call it and sends it to the servers. The servers then apply the patch, and the new config is in place. SIMPLE, RIGHT?
Show this thread -
Well, no, sometimes servers are down and errors happen, so you need a workflow engine to drive retries. Oh and that implies you have some way to tell if the change even made it there, so you need a poller or a pusher or something to monitor config propagation status. YOU GET ME.
Show this thread -
But then we build all that, and it works, and changes get integrated and customer configs change in seconds, and WE'RE GOOD, RIGHT?
Show this thread -
We're good until one day we're not. Imagine a big event happens, maybe a power outage, or a spike on the internet due to the Super Bowl or something, and for whatever reason a bunch of customers all try to make changes at the same time?
Show this thread -
The API gets choked, the workflow gets backed up, the status monitors start to lag. OUCH. Even worse, some customers start undoing their changes because they don't see them happening quickly. Now we have pointless changes stuck in the system that no-one even wants!! OUCH OUCH.
Show this thread -
Now, let's try a much better, O(1) control plane. DUMB AS ROCKS. Imagine if instead that the customer API pretty much edits a document directly, like a file on S3 or whatever. And imagine the servers just pull that file every few seconds, WHETHER IT CHANGED OR NOT.
Show this thread -
This system is O(1). The whole config can change, or none of it, and the servers don't even care. They are dumb. They just carry on. This is MUCH MUCH more robust.
Show this thread -
It never lags, it self-heals very quickly, and it's always ready for a set of events like everyone changing everything at once. The SYSTEM DOES NOT CARE.
Show this thread -
This is how we build the most critical control planes at AWS. For example if you use Route 53 health checks, or rely on NLB, or NAT GW, the underlying health statuses are *ALWAYS* being pushed around as a bitset. ALWAYS ALWAYS.
Show this thread -
A few statuses can change (which is normal), and the system reacts, or they can ALL change, and it will react just the same (which is abnormal). We could have lots of targets suddenly disappear .... break ...
Show this thread -
"I felt a great disturbance in the Force, as if millions of voices suddenly cried out in terror and were suddenly silenced. I fear something terrible has happened."
Show this thread -
... end break ... and the system does not care or even know how many changed! It just works. Very very reliable, and unde-rapreciated.
Show this thread -
BONUS CONTENT: O.k., what does this have to do with Maxwell's Daemon? WHAT EVEN IS THAT? Well the first kind of constant time problem, for crypto, is an example of minimizing information theoretical entropy. It's about minimizing perceptible disorder and what is computable.
Show this thread -
The second is more like traditional thermodynamic entropy. With a constant-time control plane, we're trying to build a system that has a constant temperature, because it's always doing the same amount of work.
Show this thread -
Ok now if I lost you with that, here's the simple version: when systems change a lot, they undergo stress, and getting stressed at the worst times, like under attacks or in the middle of outages, is really bad. At a high level these problems are the same.
Show this thread -
And it turns out at a low level they are the same too! Information theoretical entropy, which is all about computability and bits, can be related to thermodynamic entropy, which is all about energy and work!
Show this thread -
How? Well Maxwell proposed a thought experiment that showed a problem with thermodynamics. He imagined two rooms side by side, at even temperatures. He posited that a daemon, a ghost, could open a door between the two rooms and let faster moving molecules over to one side.
Show this thread -
This sort of disproved existing thermodynamics, because the work it takes to open a door like that is less than the energy transferred due to the faster molecules changing sides. This was unresolved for a long time ... UNTIL ...
Show this thread -
A bunch of folks showed that you don't just open the door, you have to observe and predict, and therefore COMPUTE, the position of the molecule. THIS TAKES ENERGY ... and they showed that the energy it takes is at least equivalent to the difference.
Show this thread -
So in a big old nuts, strange-loop sort of way, these completely unrelated things are totally the same at both a MACRO and MICRO level. This blows my mind .... but may also make no sense ;-)
Show this thread -
O.k. that's it, and AMA if you want! Oh and
@threadreaderapp please unroll this mess I made.Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.