See: https://mathiasbynens.be/notes/es-unicode-property-escapes#word … Transpiled version: https://mothereff.in/regexpu#input=const+regex+%3D+%2F%28%5B%5Cp%7BAlphabetic%7D%5Cp%7BMark%7D%5Cp%7BDecimal_Number%7D%5Cp%7BConnector_Punctuation%7D%5Cp%7BJoin_Control%7D%5D%2B%29%2Fgu%3B%0Aconst+text+%3D+%60%0AAmharic%3A+%E1%8B%A8%E1%8A%94+%E1%88%9B%E1%8A%95%E1%8B%A3%E1%89%A0%E1%89%A2%E1%8B%AB+%E1%88%98%E1%8A%AA%E1%8A%93+%E1%89%A0%E1%8B%93%E1%88%A3%E1%8B%8E%E1%89%BD+%E1%89%B0%E1%88%9E%E1%88%8D%E1%89%B7%E1%88%8D%0ABengali%3A+%E0%A6%86%E0%A6%AE%E0%A6%BE%E0%A6%B0+%E0%A6%B9%E0%A6%AD%E0%A6%BE%E0%A6%B0%E0%A6%95%E0%A7%8D%E0%A6%B0%E0%A6%BE%E0%A6%AB%E0%A7%8D%E0%A6%9F+%E0%A6%95%E0%A7%81%E0%A6%81%E0%A6%9A%E0%A7%87+%E0%A6%AE%E0%A6%BE%E0%A6%9B-%E0%A6%8F+%E0%A6%AD%E0%A6%B0%E0%A6%BE+%E0%A6%B9%E0%A7%9F%E0%A7%87+%E0%A6%97%E0%A7%87%E0%A6%9B%E0%A7%87%0AGeorgian%3A+%E1%83%A9%E1%83%94%E1%83%9B%E1%83%98+%E1%83%AE%E1%83%9D%E1%83%9B%E1%83%90%E1%83%9A%E1%83%93%E1%83%98+%E1%83%A1%E1%83%90%E1%83%B0%E1%83%90%E1%83%94%E1%83%A0%E1%83%9D+%E1%83%91%E1%83%90%E1%83%9A%E1%83%98%E1%83%A8%E1%83%96%E1%83%94+%E1%83%A1%E1%83%90%E1%83%95%E1%83%A1%E1%83%94%E1%83%90+%E1%83%92%E1%83%95%E1%83%94%E1%83%9A%E1%83%97%E1%83%94%E1%83%95%E1%83%96%E1%83%94%E1%83%91%E1%83%98%E1%83%97%0AMacedonian%3A+%D0%9C%D0%BE%D0%B5%D1%82%D0%BE+%D0%BB%D0%B5%D1%82%D0%B0%D1%87%D0%BA%D0%BE+%D0%B2%D0%BE%D0%B7%D0%B8%D0%BB%D0%BE+%D0%B5+%D0%BF%D0%BE%D0%BB%D0%BD%D0%BE+%D1%81%D0%BE+%D1%98%D0%B0%D0%B3%D1%83%D0%BB%D0%B8%0AVietnamese%3A+T%C3%A0u+c%C3%A1nh+ng%E1%BA%A7m+c%E1%BB%A7a+t%C3%B4i+%C4%91%E1%BA%A7y+l%C6%B0%C6%A1n%0A%60%3B%0A%0Alet+match%3B%0Awhile+%28match+%3D+regex.exec%28text%29%29+%7B%0A%09const+word+%3D+match%5B1%5D%3B%0A%09console.log%28%60Matched+word+with+length+%24%7B+word.length+%7D%3A+%24%7B+word+%7D%60%29%3B%0A%7D&unicodePropertyEscape=1 … @FakeUnicode
-
-
Replying to @mathias
Dmitri Goosens Retweeted Dmitri Goosens
thanks
@mathias but I think solution below with XRegExp is better https://twitter.com/dgoosens/status/844579612038443010 …@FakeUnicodeDmitri Goosens added,
1 reply 0 retweets 0 likes -
Replying to @dgoosens
for the record with XRegExp /[\w]*/u becomes '[\\p{L}\\p{N}]*'
@mathias@FakeUnicode1 reply 0 retweets 0 likes -
My problem is: These properties change over time. JS \p regex will be wildly different on different browsers/libraries.
4 replies 0 retweets 1 like -
Replying to @FakeUnicode @dgoosens
I’ll update `\p{…}` tests in test262 yearly = strong incentive for engines to update asap following Unicode updates
1 reply 0 retweets 0 likes -
Still, for things like client-side verification of "is everything entered in a form a letter?" there'll be divergence :(
1 reply 0 retweets 0 likes -
Replying to @FakeUnicode @dgoosens
What more can we do? (Btw, the proposal explicitly lists the properties to be supported for the same reason.)
1 reply 0 retweets 0 likes -
¯\_(ツ)_/¯ When making such finely crafted footguns, one must also sell crutches.
1 reply 0 retweets 4 likes -
Scenario: Client (JS) and server (Java) use \p{L}, unit tests pass. As new Unicode is added, these start to disagree more
2 replies 0 retweets 0 likes -
What would be the fix, other than "
Don't assume different implementations of Unicode \p regex use the same version
"?1 reply 0 retweets 0 likes
Updating spec tests + working with engines to ensure they update to the latest Unicode version in tandem.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.
JavaScript, HTML, CSS, HTTP, performance, security, Bash, Unicode, i18n, macOS.