Oh, I had missed that the Galil rule gives Boyer-Moore linear worst case performance. This might mean that BM is always a better choice than KMP.
Interesting. I wouldn't necessarily think that, although perhaps decryption overhead is the variable I'm missing. What does `pv < OpenSubtitles2018.raw.en > /dev/null` say after dropping caches? If ripgrep can get 4.9 GB/s in cache, then I'd expect it to be as fast as `pv`.
-
-
`pv` gives me the same numbers; ~800 MiB/s for IO, 4.7 GiB/s for cache. Fio says I should be able to get 1.6 GiB/s for sequential read using io_uring, so that's something to try. I should also see what the other IO engines can do, though.
-
Ah I see, okay good. So I think this at least confirms the problem isn't with ripgrep. :-) Thanks for your patience debugging this with me! Especially on an imperfect medium.
- 1 more reply
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.