Very impressed with https://github.com/BurntSushi/ripgrep … $ time grep -v -x -i -f stoplist words > words-stop real 100m39.604s user 97m24.045s sys 0m13.054s $ time rg -v -x -i -f stoplist words > words-stop-rg real 4m9.471s user 1m5.310s sys 2m57.231s
One possible explanation is that the first run with GNU grep was reading `words` from disk (it is presumably a large file), but the second run with rg was mostly reading `words` from the I/O cache. You could test this by running GNU grep a few times.
-
-
Later I can try to narrow down the culprit grep flag(s). I could also maybe try running a profiler or something - let me know if you have any ideas about what might be fruitful.
-
Looks like grep just has terrible scaling factors wrt the number of patterns (especially with -x)pic.twitter.com/DQJ8UqnGK6
- 1 more reply
New conversation -
-
-
I doubt it, the file is only 1.7GB on SSD (laptop, but still). $ dd if=words of=/dev/null iflag=direct 1718891444 bytes (1.7 GB, 1.6 GiB) copied, 79.8457 s, 21.5 MB/s (with bs=1048576, 2.5s)
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.