I have a large repository of PDFs that I've wanted to analyze by constructing a graph over, Similarity of *questions* they ask instead of over, Bibliographical citations. NLP folks, how difficult do you think this task is?
-
Show this thread
-
Replying to @generativist
Sounds like you got many positive responses, so I guess it's my job to give the negative. Assuming these are academic papers, it seems that sometimes even undergrads cannot extract main questions from closely read text. I'd be sceptical of NPL delivering any deep solution to this
2 replies 0 retweets 3 likes -
Replying to @kaznatcheev @generativist
Especially if you want to cluster across disciplines, where people don't share a common vocabulary for asking similar questions. Within a discipline, you can get clustering by word use freq., but I doubt that'd be all that much more informative than clustering by references.
1 reply 0 retweets 2 likes -
Replying to @kaznatcheev
That's *sorta* related to my motivation. Citations remain fairly well-partitioned by discipline; questions aren't. But, I think you point at the main difficultly — they use different words to create similar questions. And like you said, even experts can't navigate that easily.
2 replies 0 retweets 2 likes -
Replying to @generativist
I suspect this is nearly impossible to do reliably by experts or NPL. It'd be interesting to have a training set of new interdisciplinary fields that spawned by noting 2 unrelated fields ask similar questions or need similar tools. But that might be a very small training set. :(
1 reply 0 retweets 1 like
Yea. I suspect this is true, if only even because it doesn't exist... (Although, that evidence is weak given the history of availability.)
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.