Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval

Srivastava, Avikalp; Datt, Madhav

doi:10.1145/3132847.3133162

Computer Science > Information Retrieval

arXiv:1712.05574 (cs)

[Submitted on 15 Dec 2017]

Title:Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval

Authors:Avikalp Srivastava, Madhav Datt

View PDF

Abstract:Semantic similarity based retrieval is playing an increasingly important role in many IR systems such as modern web search, question-answering, similar document retrieval etc. Improvements in retrieval of semantically similar content are very significant to applications like Quora, Stack Overflow, Siri etc. We propose a novel unsupervised model for semantic similarity based content retrieval, where we construct semantic flow graphs for each query, and introduce the concept of "soft seeding" in graph based semi-supervised learning (SSL) to convert this into an unsupervised model.
We demonstrate the effectiveness of our model on an equivalent question retrieval problem on the Stack Exchange QA dataset, where our unsupervised approach significantly outperforms the state-of-the-art unsupervised models, and produces comparable results to the best supervised models. Our research provides a method to tackle semantic similarity based retrieval without any training data, and allows seamless extension to different domain QA communities, as well as to other semantic equivalence tasks.

Comments:	Published in Proceedings of the 2017 ACM Conference on Information and Knowledge Management (CIKM '17)
Subjects:	Information Retrieval (cs.IR)
Cite as:	arXiv:1712.05574 [cs.IR]
	(or arXiv:1712.05574v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.1712.05574
Related DOI:	https://doi.org/10.1145/3132847.3133162

Submission history

From: Avikalp Srivastava [view email]
[v1] Fri, 15 Dec 2017 08:22:23 UTC (484 KB)

Computer Science > Information Retrieval

Title:Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Soft Seeded SSL Graphs for Unsupervised Semantic Similarity-based Retrieval

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators