I've tested this on occasion in the past. The problem is that the relative performance in benchmarks will depend wholly on the fraction of the time attributable to count(), so it's never the same between different queries.
-
-
Replying to @RhodiumToad @lukaseder and
Picking a single query and measuring the overhead specific to count() (this isn't simple) suggests that in fact, count(1) is 50% slower than count(*): on the machine I tested on, this is roughly 4 ns for count(*) and 6 ns for count(1)
1 reply 0 retweets 0 likes -
Replying to @RhodiumToad @lukaseder and
But the rest of the query usually drowns that out; e.g. in my tests that represented a difference of 2 seconds for a query taking 47 seconds
1 reply 0 retweets 0 likes -
Replying to @RhodiumToad @lukaseder and
For example, consider the performance difference between "select count(1) from ...", "select count(1), count(2) from ...", and "select count(1), count(*) from ..." on the same input. (Note that the two count() calls MUST be different or they will be combined)
1 reply 0 retweets 0 likes -
Replying to @RhodiumToad @l_avrot and
2/47 seconds is still 5%, though. Does the parser have some pre-existing way of recognising constant expressions like "1"? Might be a low hanging fruit... It probably isn't, as these things are often more complicated than it seems from the outside.
2 replies 0 retweets 0 likes -
-
Replying to @petereisentraut @lukaseder and
It does indeed, though testing with pg11 shows a fairly consistent difference of around 0.5% in favour of count(*) still, I'm not sure why. (My test query as a whole gets about a 25% speedup from JIT)
1 reply 0 retweets 2 likes -
Replying to @RhodiumToad @petereisentraut and
/me needs to get back to improving the generated JIT code, so LLVM can properly optimize the aggregation code (and e.g. recognize the constant(s)).
1 reply 0 retweets 3 likes -
Replying to @AndresFreundTec @RhodiumToad and
Due to the fact that there's a lot of pointer constant in the generated IR, LLVM often doesn't recognize that a lot a lot of expression steps could be constant folded. A similar problem exists with optimizing away stores/loads to/from FunctionCallInfo.
1 reply 0 retweets 1 like -
Replying to @AndresFreundTec @RhodiumToad and
Which unfortunately means that even though the function body gets inlined, we still do argument/result passing through FunctionCallInfo in most cases. While the inlining removes the call overhead etc, the different functions won't efficiently be combined.
1 reply 0 retweets 0 likes
Oh, there's a stupid regression in v12/master. We don't generate tuple deforming anymore for most !scan nodes. But I think that shouldn't matter for the artificial benchmark in this case.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.