musl's strlen() is a work of art except for the hopefully-harmless UB http://git.musl-libc.org/cgit/musl/tree/src/string/strlen.c …
-
-
@johnregehr The bad news: compilers won't generate this unrolled code even if you ask them to. You have to write it by hand... -
@RichFelker gcc 5.3 gives me a nice 8-way unroll with -funroll-all-loops -O3 -
@johnregehr What's your idea of a "nice unroll"? I get cmpb;leaq;je per iteration. Instead it should just do cmpb;je. -
@johnregehr This change makes a 2x performance difference, from "slowest reasonable strlen" to "fastest strlen". -
@RichFelker ok, I was figuring a modern core could run those extra insns in parallel -
@johnregehr All 3 in parallel? The cmpb and je (if correctly predicted) do run in parallel and that's how we get 1 cycle/byte.
End of conversation
New conversation
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.