might already be one on http://jjj.de
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
A few quick and dirty ones: https://gist.github.com/Laksen/a9cc0a81b4f2943f7df49233f26ee530 … Seeing as the bitmanip versions were manually unrolled I assumed that was allowed, but I kept the slow ones in there. With O3 some are just as few instructions
-
Thanks! I'm now using your bm64t_baseisa2 as my bm64t_baseisa. (I also have my own bm64t_baseisa2.) In the other cases the optimized versions I wrote seem to perform better. I've also cleaned up the bitmanip code a bit. Btw jfyi, your hxor(x) is equivalent to clmulr(x,-1) afaictpic.twitter.com/Qg6sQupVbI
- Show replies
New conversation -
-
-
Basically this is just a copy of 512 Bytes from (char *)in[] with binary offset aaabbbccc to (char *)out[] with offset aaacccbbb. The fastest way should be 512 seperate assignments. (complete loop unroll)
-
Did you look at the current version on github? It's doing basically that. (The bitmanip version is still over 2x faster.)pic.twitter.com/faq4jL0IT1
- Show replies
New conversation -
-
-
void bm64_baseisa512(const uint64_t in[64], uint64_t out[64]) { const unsigned char *in_b = (void *)in; unsigned char *out_b = (void *)out;
#pragma GCC unroll 512 for (int i = 0; i < 512; i++) out_b[(i & 0700) + ((i << 3) & 0070) + ((i >> 3) & 0007)] = in_b[i]; }Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.