Bonnie had an old paper surveying this topic https://drum.lib.umd.edu/bitstream/handle/1903/807/CS-TR-3615.pdf?sequence=2 … Not sure how far the field has progressed, but this '98 paper is as good as any to start with I suppose. Cheers
-
-
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
-
-
If you're set on an algorithmic approch, disregard this: https://arxiv.org/abs/1810.04805 I've tried multi-language bert models, and i'm often surprised at how well they work, if they fit your use-case.
-
Would love to see Casey get into AI, but too much on his plate already I guess.
End of conversation
New conversation -
-
-
Every production search engine I’ve seen uses language-specific tokenizers and stemmers.
Thanks. Twitter will use this to make your timeline better. UndoUndo
-
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.