But it's not quite *parsing*. There exists invalid HTML that will be recognized as valid by even the most advanced regular expressions, because regular expressions can't properly match nested expressions.
-
-
-
I totally get the joke tho :3
- 2 more replies
New conversation -
-
-
Regular expressions and finite state automa are equivalent in expressive power.
-
If you are in Computer Science class, the answer to "Can regular expressions parse a context-sensitive language?" is "No". Everywhere else, you're probably using Perl compatible regexen. They can use backtracking to express context sensitivity OR blow up your program.
End of conversation
New conversation -
-
-
It's also a response to the knee-jerk reaction whenever people see a regular expression anywhere near HTML of "OH NO! You can't do that!" without considering how it's being used in that particular instance.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
I get the joke but the main reason to use an existing parser is making this question is irrelevant, isn't it ?
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Parsing is lexing. Okay, got it :)
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
Using pushdown automaton of course. In ideal case. In real case - it's always a Turing Machine, full of heuristics. While regular expressions are equivalent to finite state machines (usually).
-
When I say "usually" I mean that there are some Regular Expression frameworks which, actually, make them more powerful than formal regular expressions, so they may parse at least context-free languages. And I think I've even seen something complete.
End of conversation
New conversation -
-
-
Except if there is a joke I dont get; HTML is not a regular language and hence cannot be parsed by regular expressions. I love regexp, but they can only parse certain things, and HTML is not one of them. http://www.welbog.ca/glue/53/XML_is_not_regular …
-
The joke, I think, is that approximately everybody out there in fact uses regular expressions to parse HTML, because it’s fine almost all the time, and most of the cases where it isn’t are either a) not valid HTML b) pointlessly pathological and should be ignored (or rewritten).
- 3 more replies
New conversation -
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.