Conversation

Replying to
3) There are a lot of steps to processing an order: a) DDOS protection b) risk checks c) queue for matching d) matching e) updating from fills f) sending out updates to users (plus lots in between)
1
11
4) Each has their own weird idiosyncrasies. This is why you get people arguing over what latency is: are you measuring a-f? a-c? d-f? This also means that there are lots of ways things could go wrong.
2
4
5) FTX hasn't had any problems with a-d in a while, a consequence of upgrades we've made over the last few months. It's one of the few exchanges with no downtime today. However, e/f did have delays: all the trading happened correctly but it took a while to notify in some cases.
1
6
6) So the upgrade we rolled out late last night specifically addresses e/f, 3x'ing the capacity.
2
6
7) One side-note here: what does it mean to 'upgrade' an exchange? _Sometimes_ it means you buy a bigger server. We are really fucking embarrassed if that happens to us. Why? Because if _that_ was the bottleneck, we should have done it a while ago.
1
11
8) So for us, 'upgrade' almost always means 'parallelization'. The core problem of matching engines is: you can always use more computers to match, but that only works if you can process in parallel. Say Alice and Bob both submit orders to buy. a & b can be done in parallel.
2
10
9) Risk checks are generally account specific, so you can just have a different computer handle each of theirs. But then you hit c & d: actually matching the orders. And the thing is, they're not independent. If Alice lifts an offer, Bob can't lift that same offer.
1
5
10) At least, not if they're in the same market. So you can't have one computer handle Alice's matching and another handle Bob's, at least not trivially, or they'll clash with each other, both thinking they lifted the same offer. This is when it gets tricky to increase capacity
1
3
11) e & f are somewhere in between. In some sense, f should be parallelizable: just have separate computers handle updates for each fill. But what if Bob simultaneously buys BTC/USD and BTC/USDT. Both increase his BTC balance.
1
3
12) If they don't talk to each other, both will report too low of a BTC balance for him! So there's still some tricky work to do there on the interplay between similar but different fills. And if you can't parallelize it, then you're stuck going one at a time.
1
3
13) And if you're going one by one and trading is really busy, the _single_ computer handling all fills notifications will get a bigger and bigger queue of things to notify. So at some point the real solution here has to be finding a way to do more than one at once.
Replying to
14) (FWIW, when and Gary were talking about this today, the metaphor used was basically of a symphony with a conductor, and improvements were to the conductor that coordinated which computer processed which fills.)
2
14
16) One missing thing from this: some of the 'clashing' when you try to parallelize things comes from updating the same single database too frequently; there's a data storage parallelization problem that mirrors the data processing one.
3
13