AVX512F strlen is faster nearly 5 times than the library function, and strchr nearly 4.5 times. https://github.com/WojciechMula/toys/tree/master/avx512-string … #avx512
-
-
I have no idea about context switching overhead. It is a really good question. (KNL are clocked @ 1.2GHz, BTW)
@solardizThanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Not to mention that my avx512 procedures are unsafe. :)
@solardiz -
you mean overreading data by up to 63 bytes in e.g. strlen implementation?
-
Yes, this might crash program. Can be avoided by reading aligned data, like here https://github.com/WojciechMula/sse2string/blob/master/src/strlen.S#L38 …
@RichFelker@solardiz -
but the string can be at the end of page boundary, it can segfault if you read one byte too many
-
True for my current implementation, but the SSE2 version I linked to has no this flaw.
@RichFelker@solardiz
End of conversation
New conversation -
-
-
related question - how does the OS know which SIMD registers (if any) to save when preempting a thread?
-
It seems like it must have to save the SSE/AVX context regardless of whether you use them.
-
that's what I thought. So it'll save all sse/sse2/avx/avx512 registers? That seems like a lot.
-
remember that SSE & AVX registers map to lower parts of AVX512 regs, like AL, AX, EAX
-
true but that's still 1.5 KB to save every time another user thread needs to run?
-
https://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_NA_2014.pdf … does seem to imply that it somehow knows if it wasn't used yet.
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.