Among the many benefits of running @bitfinex outside of AWS (predictability of the system, performance, ...), we were sick tired of AWS rebooting instances without much of a notice.
In many occasions AWS sent a communication after the fact.
-
-
Replying to @paoloardoino @bitfinex
If you can’t build your systems in a way that is so resilient to this that you don’t even notice... then you need a new engineering team.
1 reply 0 retweets 1 like -
Replying to @mikekelly85 @bitfinex
Nah.
@bitfinex has the best engineering team. In fact is almost the only exchange always up during windy markets. Our infra has replica, sharding and 100% resiliency. I just don't like random reboots. Dealing with high-frequency trading is not as dealing with an e-commerce
2 replies 0 retweets 10 likes -
Replying to @paoloardoino @bitfinex
If you don’t like random reboots your system is brittle. Even if they didn’t happen, you should arguably be triggering them randomly anyway to make sure your system is robust. This is what proper engineering teams do https://netflix.github.io/chaosmonkey/
1 reply 0 retweets 0 likes -
Replying to @mikekelly85 @bitfinex
Do you understand that "not liking random reboots" is different from "not being prepared" right?
1 reply 0 retweets 1 like -
Replying to @paoloardoino @bitfinex
Ok so what is the reason for your dislike?
1 reply 0 retweets 0 likes -
Replying to @mikekelly85 @bitfinex
If you have random reboots, especially on the matching engine, a single truth source for exchanges, you can of course fail over, but there is a lot of follow-up checking to do to ensure no message got lost etc. When you deal with people's money, predictability is important.
1 reply 0 retweets 1 like -
Replying to @paoloardoino @bitfinex
In other words, you designed a system that is too brittle to run on AWS reliably.
2 replies 0 retweets 0 likes -
Replying to @mikekelly85 @bitfinex
Or maybe
@bitfinex avoiding AWS, using dedicated infra + multiple locations, custom failover, not relying on someone's else magic, is what saves@bitfinex from having frequent issues. ie. yesterday many exchanges went down because AWS had issues. Do it yourself pays over time.1 reply 0 retweets 2 likes -
Replying to @paoloardoino @bitfinex
None of those potentially beneficial properties have anything to do with cycling of instances which was the original point you made.
1 reply 0 retweets 0 likes
When you have to manage a matching engine that has to ensure 50 microsecond and 100k order/sec and you get a random reboot you'll tell me. We fail-over ofc but the post reconciliation check require always an important effort. So if frequent it's annoying.
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.