Weird new (to Twitter) paper with and . The title basically tells the whole story:
Merge What You Can, Fork What You Can’t:
Managing Data Integrity in Local-First Software
riffle.systems/papoc22.pdf
In a few more words...
Conversation
In the Riffle project we're excited about local-first software. But if software is going to be local-*first* instead of local-*only*, we need a way to sync across clients.
After years working on CloudKit, this is a near and dear problem.
1
6
A popular solution to local-first sync is a CRDT (conflict-free replicated data type). These are cool data structures that can automatically merge in changes made offline, while ensuring that all clients eventually agree on the state.
CRDTs are great, but...
Replying to
By allowing any change to merge in no matter when it was made, CRDTs struggle with strong integrity constraints (foreign key, uniqueness, etc.). They also don't usually allow transactions.
1
3
And if you're thinking "well, I'm not a bank, I don't need transactions": the paper has a cute example of an instance where even a *photo manager* needs to maintain transactional consistency.
1
7
I'm pretty sure there's a Greenspun-type law here: any program built on top of a non-transactional database will eventually grow an ad hoc, informally-specified, bug-ridden, slow implementation of half of a transactional database.
So what can we do? It's in the title!
1
8
Instead of forcing everything to merge, "merge what you can, fork what you can't".
If an offline change doesn't conflict with anything, great, merge it! If it does: keep both versions around. We call this the "forking histories model".
1
4
When I talk to programmers about this, they mostly ask "isn't this just git?"
I'd argue that git is an instance of the forking histories model for the data structure of plain text that gets compared and merged via diff/patch.
But most data is not text!
1
3
The second half of our paper is a proposal that we called TreeDB, which is an instance of forking histories model for structured data. It's basically git, but replacing plain text diffing with *conflict sets*, a technique that we lifted from multi-version concurrency control.
1
5
We took our conflict set definition ~verbatim from FoundationDB!
One cool thing that gives you is an existence proof of how powerful they are: my team at Apple built an entire relational database where all the transactions were defined by conflict sets!
1
5
This was just a workshop paper with a cool proposal, not an implementation.
If you're interested in forking histories/TreeDB, whether to prove theorems about it or try implementing it, I want to talk to you! I will seriously talk your ear off about this stuff if you let me. 😅
3
1
5
And if you want an exercise to test your understanding: it turns out that there’s a bug in one of our CRDT simulation claims. We’ll write something up at some point but it could be fun to try to find it. 😉
2
