Thursday tweet thread time! This one is all about what we do in Amazon s2n to prevent security issues similar to this week's libssh problem.https://twitter.com/lvh/status/1052272548560478208 …
-
-
before it even runs some piece of code is probably like "hey this is a hello message, call that parser!" and then at the end that parser will have a series of branches that are basically "hey, if I'm in this mode with X enabled, go here next, otherwise go there next".
Show this thread -
When you're reading the code locally everything makes sense, because you're not thinking of other entry points or things being out of order. There's nothing glaring there.
Show this thread -
One way to catch these errors is to use a protocol fuzzer; something that sends messages in random orders, tries to jam things up. But those are hard to write ... the messages need to be somewhat valid to get so far, and the combinatorial space to explore can be HUGE.
Show this thread -
In s2n we do something differently. We borrow a trick first learned from S3's Doug Lawrence, combined with another trick from
@signalapp's@trevp__ .... we linearize the state machine into a static table, and we try to negotiate EVERYTHING just once. O.k. what does that mean ...Show this thread -
O.k. so rather than having message parsers and generators that can transition states at all, we make them standalone, all they do is parse or generate messages. So how do we know which ones to call and when?
Show this thread -
We put everything in a table, and use function pointers, see s2n's at:https://github.com/awslabs/s2n/blob/master/tls/s2n_handshake_io.c#L62 …
Show this thread -
and then we fully linearize every possible valid order of states ... https://github.com/awslabs/s2n/blob/master/tls/s2n_handshake_io.c#L85 …
Show this thread -
With that approach, when we negotiate what options/modes/extensions you're going to use, we do it just once and handcuff ourselves to the corresponding linearized set of state transitions. That is the only order we will then follow. An out of order message shuts everything down.
Show this thread -
That makes our state machine itself tiny, it's here: https://github.com/awslabs/s2n/blob/master/tls/s2n_handshake_io.c#L253 … all it does is increment a value! Btw one of our coding tenets is never to mix control flow and message parsing in the same function. Big win here!
Show this thread -
O.k. so that's the basic approach, but how do we then test it? how do we make sure that all of our valid states are correct? With some help from
@galois , we formally verify it! Here's how ...Show this thread -
So we have a declarative definition of the valid TLS states that come from the RFC. It's written in Cryptol, and is here: https://github.com/awslabs/s2n/blob/master/tests/saw/spec/rfc-handshake.cry#L134 … now that's done by hand and we have to trust it, it's the "bottom turtle" in the proof, but it's easy to read and review.
Show this thread -
We then use SAW to map our s2n_handshake C code to something that can be symbolically executed for all possible inputs. Here's the SAW code that does the mapping: https://github.com/awslabs/s2n/blob/master/tests/saw/s2n_handshake_io.saw …
Show this thread -
So together, these actually check that our handshake code really does all TLS valid state transitions correctly, and only those. We caught bugs when we first added it too! Not a security issue, but it turns out we didn't support an obscure combination of OSCP and resumption.
Show this thread -
Now my favorite part ... how do we know that the verification itself actually works? We run it on every build, but what does that say?
Show this thread -
We *also* "verify the verifier" by forcing some negative test cases. We actually patch the code with known errors: https://github.com/awslabs/s2n/tree/master/tests/saw/failure_tests … and check that the formal verification fails! This is super important for checking formal verf btw, and often overlooked.
Show this thread -
We also do fuzz tests (https://github.com/awslabs/s2n/tree/master/tests/fuzz …) and we have integration tests with other implementations, to check for inter-op.
Show this thread -
Our goal with tooling is to ensure that we have failsafes beyond code-review. For these state machine type bugs, it actually takes quite a lot! There's a lot more code verifying it than implementing it. It's not surprising that these issues crop up in real world software.
Show this thread -
Anyway, that's it unless you want to AMA. I'll just ask
@threadreaderapp to please unroll this thread!Show this thread
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.