WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction

Dalvi, Bhavana; Cohen, William W.; Callan, Jamie

Computer Science > Machine Learning

arXiv:1307.0261 (cs)

[Submitted on 1 Jul 2013]

Title:WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction

Authors:Bhavana Dalvi, William W. Cohen, Jamie Callan

View PDF

Abstract:We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining clusters of distributionally similar terms and concept-instance pairs obtained with Hearst patterns. In contrast, our method relies on a novel approach for clustering terms found in HTML tables, and then assigning concept names to these clusters using Hearst patterns. The method can be efficiently applied to a large corpus, and experimental results on several datasets show that our method can accurately extract large numbers of concept-instance pairs.

Comments:	10 pages; International Conference on Web Search and Data Mining 2012
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL); Information Retrieval (cs.IR)
Cite as:	arXiv:1307.0261 [cs.LG]
	(or arXiv:1307.0261v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1307.0261

Submission history

From: Bhavana Dalvi [view email]
[v1] Mon, 1 Jul 2013 02:49:08 UTC (75 KB)

Full-text links:

Access Paper:

view license

Current browse context:

< prev | next >

new | recent | 2013-07

Change to browse by:

cs.CL
cs.IR
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Bhavana Bharat Dalvi
William W. Cohen
Jamie Callan

export BibTeX citation

Computer Science > Machine Learning

Title:WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:WebSets: Extracting Sets of Entities from the Web Using Unsupervised Information Extraction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators