ieee754 semantics aren't trivial despite a short spec. I implemented binary_float and binary_double for Oracle and did my best to make it ieee754 compliant (more so than the competition at the time) but there were still compromises:
Conversation
Replying to
I thought that Oracle only had a handful of internal types (number + varchar?), with only a few hard-coded comparators needed for indexing. The rumor I'd heard was that these are mapped to user-visible datatypes in a linear way (e.g., timestamps are really number). Is that true?
1
Replying to
I am comfortable claiming that if you care about performance then you want datatypes for which memcmp is sufficient for comparisons (sorting, indexing, grouping).
1
2
As disclosed in patent for the sort algorithm I added to Oracle, the new sort requires that. I am not sure if per-datatype representations have been explained, so I hesitate to share that.
1
Does Postgres use memcmp for order/group by on a single column? Are the byte-comparable formats of the columns concatenated so that a single call to memcmp can be used? At this point, I don't remember the answer for MySQL.
2
1
Replying to
It depends. We use something called abbreviated keys for sort - aka the prefix sorting technique from Alphasort paper (my code, actually). We don't use conditioned binary keys ("normalized keys") for index scans, etc. We probably should. I wrote about it: wiki.postgresql.org/wiki/Key_norma
1
2
Replying to
Will read the link soon. My concern for caching the prefix is that (waves hands) the keys are frequently much longer than 4 or 8 bytes for a DBMS sort so caching the prefix looks much better on the sort benchmark than on the too-long key case.
2
1
Replying to
True - but the overall picture is complicated. The *combined* improvement to both temporal and spatial locality matters. A quicksort pivot elem naturally gets compared again and again, around the same time. Optimization may only "start to fail" when other elements are in cache.
1
And with strxfrm() style binary keys (to back a strcoll() text comparator), we can get really far with just a prefix. Chances are good that if we have to tiebreak we can use a strcmp()-exact-match fast path, without calling strcoll().
With all that said, it could definitely be improved - no question. I'm just making the point that "avoiding visiting Andromeda" was what *really* mattered. Having the right general idea was the truly important thing.
1


