ieee754 semantics aren't trivial despite a short spec. I implemented binary_float and binary_double for Oracle and did my best to make it ieee754 compliant (more so than the competition at the time) but there were still compromises:
Conversation
Replying to
I thought that Oracle only had a handful of internal types (number + varchar?), with only a few hard-coded comparators needed for indexing. The rumor I'd heard was that these are mapped to user-visible datatypes in a linear way (e.g., timestamps are really number). Is that true?
1
Replying to
I am comfortable claiming that if you care about performance then you want datatypes for which memcmp is sufficient for comparisons (sorting, indexing, grouping).
1
2
As disclosed in patent for the sort algorithm I added to Oracle, the new sort requires that. I am not sure if per-datatype representations have been explained, so I hesitate to share that.
1
Does Postgres use memcmp for order/group by on a single column? Are the byte-comparable formats of the columns concatenated so that a single call to memcmp can be used? At this point, I don't remember the answer for MySQL.
2
1
Replying to
It depends. We use something called abbreviated keys for sort - aka the prefix sorting technique from Alphasort paper (my code, actually). We don't use conditioned binary keys ("normalized keys") for index scans, etc. We probably should. I wrote about it: wiki.postgresql.org/wiki/Key_norma
1
2
Replying to
Will read the link soon. My concern for caching the prefix is that (waves hands) the keys are frequently much longer than 4 or 8 bytes for a DBMS sort so caching the prefix looks much better on the sort benchmark than on the too-long key case.
2
1
Replying to
True - but the overall picture is complicated. The *combined* improvement to both temporal and spatial locality matters. A quicksort pivot elem naturally gets compared again and again, around the same time. Optimization may only "start to fail" when other elements are in cache.
And with strxfrm() style binary keys (to back a strcoll() text comparator), we can get really far with just a prefix. Chances are good that if we have to tiebreak we can use a strcmp()-exact-match fast path, without calling strcoll().
1
With all that said, it could definitely be improved - no question. I'm just making the point that "avoiding visiting Andromeda" was what *really* mattered. Having the right general idea was the truly important thing.
1


