I discovered this while tracking down a discrepancy in match count between ripgrep and grep. ripgrep does not permit `.` to match surrogates.
-
-
Show this threadThanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I would call it a bug almost without question. Tangentially, regardless of UTF-8 validation, is `.` expected to match a single surrogate code point, or the entire pair?
-
Good question. I don't even know if it's well formed though haha! Usually `.` is "matches a single codepoint," but certainly, in UTF-16, you want it to match the pair, not one surrogate. I guess you would want to carry that over to UTF-8 as well, but idk what ramifications are...
End of conversation
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.