A Hierarchical Graphical Model for Record Linkage

Ravikumar, Pradeep; Cohen, William

Computer Science > Machine Learning

arXiv:1207.4180 (cs)

[Submitted on 12 Jul 2012]

Title:A Hierarchical Graphical Model for Record Linkage

Authors:Pradeep Ravikumar, William Cohen

View PDF

Abstract:The task of matching co-referent records is known among other names as rocord linkage. For large record-linkage problems, often there is little or no labeled data available, but unlabeled data shows a reasonable clear structure. For such problems, unsupervised or semi-supervised methods are preferable to supervised methods. In this paper, we describe a hierarchical graphical model framework for the linakge-problem in an unsupervised setting. In addition to proposing new methods, we also cast existing unsupervised probabilistic record-linkage methods in this framework. Some of the techniques we propose to minimize overfitting in the above model are of interest in the general graphical model setting. We describe a method for incorporating monotinicity constraints in a graphical model. We also outline a bootstrapping approach of using "single-field" classifiers to noisily label latent variables in a hierarchical model. Experimental results show that our proposed unsupervised methods perform quite competitively even with fully supervised record-linkage methods.

Comments:	Appears in Proceedings of the Twentieth Conference on Uncertainty in Artificial Intelligence (UAI2004)
Subjects:	Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Report number:	UAI-P-2004-PG-454-461
Cite as:	arXiv:1207.4180 [cs.LG]
	(or arXiv:1207.4180v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1207.4180

Submission history

From: Pradeep Ravikumar [view email] [via AUAI proxy]
[v1] Thu, 12 Jul 2012 19:48:03 UTC (440 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2012-07

Change to browse by:

cs
cs.IR
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Pradeep Ravikumar
Pradeep D. Ravikumar
William W. Cohen

export BibTeX citation

Computer Science > Machine Learning

Title:A Hierarchical Graphical Model for Record Linkage

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Hierarchical Graphical Model for Record Linkage

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators