Oh, I had missed that the Galil rule gives Boyer-Moore linear worst case performance. This might mean that BM is always a better choice than KMP.
-
Show this thread
-
And Apostolico claims to place an O(2n) upper bound on BM. That's the same worst case as KMP.
1 reply 0 retweets 2 likesShow this thread -
Continuing the string searching story, I had the brain wave to try ripgrep. Searching for fixed strings in binary files, it tops out at 900 MiB/s. That's the target to meet. I should inspect the code to see how it's done.
2 replies 0 retweets 0 likesShow this thread -
Replying to @chvest
The key to success in this space is less about algorithm and more about making efficient use of the hardware via vectorization. ripgrep's primary literal searcher is here: https://github.com/rust-lang/regex/blob/a0f541bd707a39094d839c1ffd0141d27fe40681/src/literal/imp.rs … And in particular, the single substring case is here: https://github.com/rust-lang/regex/blob/a0f541bd707a39094d839c1ffd0141d27fe40681/src/literal/imp.rs …
1 reply 0 retweets 1 like -
Replying to @burntsushi5 @chvest
Make sure your benchmarks include a healthy variety. 900MB/s sounds a bit slow to me in many cases. For example: https://github.com/rust-lang/regex/blob/a0f541bd707a39094d839c1ffd0141d27fe40681/bench/log/07/rust#L85 …
4 replies 0 retweets 0 likes
My hope is to move the regex code over to using bstr, which provides a Two-Way implementation that also uses the same frequency style heuristics found in regex today: https://github.com/BurntSushi/bstr/blob/f8a6c7031df1748b8469eb94372e2bd53d82ef1c/src/search/twoway.rs …
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.