Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective

Chen, Zhaoqiang; Chen, Qun; Hou, Boyi; Ahmed, Murtadha; Li, Zhanhuai

doi:10.1145/3242153.3242156

Computer Science > Databases

arXiv:1805.12502 (cs)

[Submitted on 31 May 2018 (v1), last revised 14 Aug 2018 (this version, v2)]

Title:Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective

Authors:Zhaoqiang Chen, Qun Chen, Boyi Hou, Murtadha Ahmed, Zhanhuai Li

View PDF

Abstract:Pure machine-based solutions usually struggle in the challenging classification tasks such as entity resolution (ER). To alleviate this problem, a recent trend is to involve the human in the resolution process, most notably the crowdsourcing approach. However, it remains very challenging to effectively improve machine-based entity resolution with limited human effort. In this paper, we investigate the problem of human and machine cooperation for ER from a risk perspective. We propose to select the machine-labeled instances at high risk of being mislabeled for manual verification. For this task, we present a risk model that takes into consideration the human-labeled instances as well as the output of machine resolution. Finally, we evaluate the performance of the proposed risk model on real data. Our experiments demonstrate that it can pick up the mislabeled instances with considerably higher accuracy than the existing alternatives. Provided with the same amount of human cost budget, it can also achieve better resolution quality than the state-of-the-art approach based on active learning.

Comments:	5 pages, 3 figures
Subjects:	Databases (cs.DB)
Cite as:	arXiv:1805.12502 [cs.DB]
	(or arXiv:1805.12502v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1805.12502
Related DOI:	https://doi.org/10.1145/3242153.3242156

Submission history

From: Zhaoqiang Chen [view email]
[v1] Thu, 31 May 2018 14:54:55 UTC (245 KB)
[v2] Tue, 14 Aug 2018 09:12:46 UTC (253 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DB

< prev | next >

new | recent | 2018-05

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zhaoqiang Chen
Qun Chen
Boyi Hou
Murtadha H. M. Ahmed
Zhanhuai Li

export BibTeX citation

Computer Science > Databases

Title:Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Improving Machine-based Entity Resolution with Limited Human Effort: A Risk Perspective

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators