Cost-effective Variational Active Entity Resolution

Bogatu, Alex; Paton, Norman W.; Douthwaite, Mark; Davie, Stuart; Freitas, Andre

Computer Science > Machine Learning

arXiv:2011.10406 (cs)

[Submitted on 20 Nov 2020 (v1), last revised 26 Feb 2021 (this version, v3)]

Title:Cost-effective Variational Active Entity Resolution

Authors:Alex Bogatu, Norman W. Paton, Mark Douthwaite, Stuart Davie, Andre Freitas

View PDF

Abstract:Accurately identifying different representations of the same real-world entity is an integral part of data cleaning and many methods have been proposed to accomplish it. The challenges of this entity resolution task that demand so much research attention are often rooted in the task-specificity and user-dependence of the process. Adopting deep learning techniques has the potential to lessen these challenges. In this paper, we set out to devise an entity resolution method that builds on the robustness conferred by deep autoencoders to reduce human-involvement costs. Specifically, we reduce the cost of training deep entity resolution models by performing unsupervised representation learning. This unveils a transferability property of the resulting model that can further reduce the cost of applying the approach to new datasets by means of transfer learning. Finally, we reduce the cost of labelling training data through an active learning approach that builds on the properties conferred by the use of deep autoencoders. Empirical evaluation confirms the accomplishment of our cost-reduction desideratum while achieving comparable effectiveness with state-of-the-art alternatives.

Subjects:	Machine Learning (cs.LG); Databases (cs.DB)
Cite as:	arXiv:2011.10406 [cs.LG]
	(or arXiv:2011.10406v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2011.10406
Journal reference:	2021 IEEE 37th International Conference on Data Engineering (ICDE)

Submission history

From: Alex Bogatu [view email]
[v1] Fri, 20 Nov 2020 13:47:11 UTC (1,943 KB)
[v2] Mon, 22 Feb 2021 19:29:35 UTC (1,954 KB)
[v3] Fri, 26 Feb 2021 11:13:27 UTC (1,954 KB)

Computer Science > Machine Learning

Title:Cost-effective Variational Active Entity Resolution

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Cost-effective Variational Active Entity Resolution

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators