I have a large repository of PDFs that I've wanted to analyze by constructing a graph over, Similarity of *questions* they ask instead of over, Bibliographical citations. NLP folks, how difficult do you think this task is?
-
Show this thread
-
Replying to @generativist
Sounds like you got many positive responses, so I guess it's my job to give the negative. Assuming these are academic papers, it seems that sometimes even undergrads cannot extract main questions from closely read text. I'd be sceptical of NPL delivering any deep solution to this
2 replies 0 retweets 3 likes -
Replying to @kaznatcheev @generativist
Especially if you want to cluster across disciplines, where people don't share a common vocabulary for asking similar questions. Within a discipline, you can get clustering by word use freq., but I doubt that'd be all that much more informative than clustering by references.
1 reply 0 retweets 2 likes -
Replying to @kaznatcheev
That's *sorta* related to my motivation. Citations remain fairly well-partitioned by discipline; questions aren't. But, I think you point at the main difficultly — they use different words to create similar questions. And like you said, even experts can't navigate that easily.
2 replies 0 retweets 2 likes
Explicitly, this is an idea I had reading a lot of literature on beliefs. The different disciplines use different words and concepts (even for "belief") but lots of the rhetorical questions — especially in application sections — overlap in my head.
Loading seems to be taking a while.
Twitter may be over capacity or experiencing a momentary hiccup. Try again or visit Twitter Status for more information.