It’s a 100x faster than standard PostgreSQL for analytics workloads, that’s not as hard as it sounds as standard Postgres is row oriented not columnar. It’s not that radical as other column databases exist with that performance and there’s a variety of hybrid solutions.
Conversation
Yes, was thinking the same; would be interesting to see a comparison to other analytics systems with columnar data representation. They do seem to employ quite a few cool techniques for querying though.
1
4
Here’s a comparison to another database, SingleStoreDB, with columnar data representation. (Actually, it’s universal storage which is an evolved columnstore which also supports transactions.)
1
5
btw, I removed both alloydb and synapse serverless, life is too short, any DB engine that don't return TPC-H-SF10 in less than 100 second ( warm) is not my thing.
2
3
it is 100X better but only for "Some" analytical Queries, what I don't understand, don't they know that random people on the internet will test it and write shitpost about them, why make a big claim that is easy to debunk ?
2
4
If you read the article, you see that some queries benefitted more (>100x), some less (19x).
Testing with your data model, data and app code shows the reality, but the 1st step is to understand what is the architectural difference that makes "google postgres" faster than vanilla
1
1
The data retrieval - decomression, filter, projection pushdown & scale-out is the easy part, compared to making large aggregations (especially with distinct) and joins scale. I expect the large joins/aggrs benefit much less from this columnar cache architecture.
1
1
2
I did test it the first day it was released and I knew something was odd :), notice the queries they are using in the blog are extremely simple( as you said), they did not even used TPC-H Query 1 which is the simplest and most used in any typical analytical Workload.
1
2
(I've been in the selling side of tech product marketing myself, so I understand why a vendor would want to make noise only about the best results).
And that's exactly why my 1st question is: "Which radical architectural change gives this radical performance increase" ...
2
2
... and take it from there. With some background knowledge of such systems, you can reason whether a bad test result is just matter of (insufficient) implementation or the architecture was never meant to solve the bottleneck your current test is seeing.
2
1
2
Doing well on TPC-H is all about not getting stuck on various "choke points", the specifics of which vary across each of the queries. Seems like that might make direct comparison uninformative (more than usual, even). See: vldb.org/pvldb/vol13/p1





