Damn it, now we're going to have to make hashing faster
Conversation
Replying to
hash_mem_multiplier's default was increased to 2.0 in 15, so that's something! Though it's equally relevant to hashing and sorting.
1
Maybe the cost model of sorting should be adjusted now that it has become more efficient. The constant factor should depend on the sort input type (compare cost) and row size (swap cost).
1
That's probably a good idea. IIRC the other problem with cost_sort() is that the CPU cost of external sorts is too pessimistic.
1
Agreed. Btw, regarding sort templating and specialisations: Why not try radix sort for integers or even for strings. One could even take into account the column statistics to choose between quicksort and radix sort and use a costing model behind that. What‘s your opinion on that?
1
The difficulty often comes from real world data having lots of variability. The worst case matters a lot. Alphasort paper uses a car analogy for this: "street-legal versus formula one". The fastest cars cannot be driven on public roads. That might be relevant to radix sort ideas.
Now that SSE2 intrinsics are available, would it maybe make sense to integrate a vectorization path in the sort_template.h code along the lines of this? arxiv.org/pdf/2205.05982
2
1


