Today's adventure: parsing Unicode's awful xml from CLDR with sed, because xml is unusably awful to deal with.
-
Show this thread
-
/<calendar.*gregorian/,/<\/calendar/{ /<monthWidth.*wide/,/<\/monthWidth/{ /<month /p } }
2 replies 0 retweets 0 likesShow this thread -
Replying to @RichFelker
Wait... are you... parsing....xml...with..reg....e...x..... https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 …pic.twitter.com/T5s5IcRCqL
1 reply 0 retweets 1 like -
Replying to @FakeUnicode
Yes. Because all the alternatives are just as awful.
1 reply 0 retweets 0 likes -
Replying to @RichFelker @FakeUnicode
Of course you can't parse arbitrary XML (or [X]HTML) with regex. (Ignoring the fact that bounded-nesting-depth is actually a regular language and _could_ be parsed with a hideously huge regex...) On the other hand...
1 reply 0 retweets 0 likes -
Replying to @RichFelker @FakeUnicode
XML that has to meet a fixed form to be meaningful can be parsed with regex assuming a particular pretty-printing (and a blackbox xml pretty-printer can fix that if it ever changes without having to dirty hands on xml outside the black box).
1 reply 0 retweets 0 likes -
Replying to @RichFelker
If you trust Unicode not to change the sigils on you, causing your rune casting to 𝖶Âke the beast, so be it, bu̺t 𐌜e it Ớn ͨyȮur ȟeͮȀ𝗱 Ṧ͟Н͌oựld ̃̇Ύ̜̅̉o̒ꓴ ̸b̬ͭ͏r̢̼ͅᶧ🄽Ԍ̻ r͓͙ǘ̒ͬі͒ͅṄͤ̃ ̄Ŧ̱Ὁ̓ ̈́̏ͭ͢ͅᵁ͂𝚂͖́ ̮̥̦ͬͅἊ̡̡̢̪ͯ̓ℓ̷̤̬̋͝҉̙̖𝙻̺̓̀͏̐ͬ̉!̛͇ͤͫ̂
1 reply 0 retweets 0 likes
If they change anything the code consuming the data has to be changed anyway. Aside from malformed nesting constructs the sed line-match patterns are equivalent to chained XML element selectors for matching elements.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.