"One size fits all" was always kind of wrong - he said so himself back then. What changed? You might think that the thesis hung on something about main memory size scaling, but that doesn't seem relevant to many of the specific kinds of workloads he mentions.
Conversation
In short, it seems like he was mostly wrong because: 1.) He failed to consider performance _relative to the total cost of ownership_, 2.) Having a memory hierarchy is really helpful when you think about costs in a holistic fashion, and 3.) Minimizing complexity really matters.
1
1
10
Replying to
4) motivated reasoning, because more specialized database companies are a lot easier to start
5) Under-estimated application side scale out techniques that are necessary for pure machine size reasons anyway
1
1
5
I don't think it's so clear. [3)complexity] was always my driver. If one PostgreSQL database does the job, I'd take it over five special-purpose DB technologies and three data streaming stacks... fewer things to break and page me on the weekend please!
3
2
It's even worse than that, though. Stonebraker suggested that everybody would use something like VoltDB for OLTP, while having an ETL process to get it into a column store. You could do analytics on the same data later on that way. This idea in particular was always preposterous.
1
Not so preposterous looking at modern cloud born companies data pipelines which all seem to include Postgres, Redis, Snowflake, & Kafka. That said I'd still say #3 (operational complexity) is the mind-killer for most; training/expertise/resilience is still too costly w/o scale
1
Yah. You get two tiers, in which a handful of really really big operations (e.g. Facebook, Walmart) use All The Data Tools. And then you have everyone else, who just want one "good enough" option for data storage.
1
Stonebraker made the error of looking at DB cost of ownership strictly in terms of *hardware* cost. He forgot about people-time cost and cognitive load.
1
1
I think that it's both. I agree that under-appreciating the cost and difficulty of employing experts was his largest mistake, practically speaking. I find his under-appreciation of the value of a separate transparent capacity tier more interesting as an engineer, though.
1
1
Yah. He wasn't *totally* wrong though; if you look around, you'll find that the overall usage of special-purpose databases has tremendously increased since 2005.
1
I agree with that, too. I just don't think that it had much to do with any fundamental sea change. Stonebraker talks about arrays being more natural than tables for scientific applications. I'm sure that that's true, but it's also true that the same people still use FORTRAN.
That had 100% to do with the fact that he was leading the SciDB project at the time, which was array-based.
It's true, though, that matrix-based storage is an excellent way to go for advanced analytics. But ... you can cram a matrix into a regular DB



