I may have forgotten to explain this in the talk, but you can (in most cases) pick a log segment with a local leader, because, in general, every DC will have leaders for one (or more) log segments. If all local leaders are down, yes, you'd have to proxy, which would add hops.
-
-
Replying to @jepsen_io @fauna and
Nathan VanBenschoten Retweeted Nathan VanBenschoten
That all makes sense and was covered well! The question is whether you have to wait to hear what was committed by remote log segments (3 WAN hops, see https://twitter.com/natevanben/status/1196457975407370240 …) before applying txns that were committed by a local log segment (2 WAN hops).
Nathan VanBenschoten added,
1 reply 0 retweets 0 likes -
Replying to @natevanben @fauna and
Oh, I think I get what you're driving at here. Yes, I think you need to get a commit index, and specifically, one for the decision to seal the log window, but I don't think that's in the blocking path of txns--if you couple windows to the Raft term, leader can seal independently
1 reply 0 retweets 0 likes -
Replying to @jepsen_io @natevanben and
... one thing I hadn't considered until now, though, was that if we assume clocks perfectly synced, and all nodes seal simultaneously, yes, you do need to wait a third inter-dc hop for remote leaders to inform you of the window being sealed...
@fauna, care to jump in here?1 reply 0 retweets 1 like -
Replying to @jepsen_io @natevanben and
Fauna varies from RAFT here: followers learn of commit from a quorum of peers. In a 3 replica cluster, a remote replica can seal an epoch after one WAN hop, since it only needs to hear from the leader and itself. W/ more than 3 nodes, this increases to 2 hops.
1 reply 0 retweets 2 likes -
Replying to @mf @jepsen_io and
That's a very cool improvement that seems well-suited for Fauna's architecture.
@CockroachDB opted to fsync the leader's log in parallel with followers (section 10.2.1 from the Raft thesis), which has a different set of benefits and appears to be incompatible with this approach.1 reply 0 retweets 0 likes -
Replying to @natevanben @jepsen_io and
We can do this, too! Our leader implementation sends an append, and then its own accept broadcast once the entry has been fsynced. We found through testing that the extra messages were worth it for the latency profile improvement.
1 reply 0 retweets 0 likes -
Replying to @mf @natevanben and
Does that mean that the effective txn commit latency is actually 3, not 2, network hops? 2 in the actual commit path, and an additional one implied by waiting for a log segment sealing message from remote leaders?
1 reply 0 retweets 0 likes -
Replying to @jepsen_io @natevanben and
No, because the other log segments are committing their batches for the epoch in parallel (modulo clock jitter). I drew some diagrams for the 3 replica case:pic.twitter.com/j4KZeIujil
1 reply 1 retweet 2 likes -
Replying to @mf @jepsen_io and
I forgot to mention the diagrams exclude Accepted message broadcast to other followers.
1 reply 0 retweets 0 likes
Oh, whoops, that's right. I was thinking about txns going into the log as one paxos op, and log sealing being a second process. But of course, the only actual paxos op is "This is the entire log segment"; no separate blocking phase required.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.