Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Ibrahim, Shahana; Fu, Xiao; Kargas, Nikos; Huang, Kejun

Computer Science > Machine Learning

arXiv:1909.12325 (cs)

[Submitted on 26 Sep 2019]

Title:Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Authors:Shahana Ibrahim, Xiao Fu, Nikos Kargas, Kejun Huang

View PDF

Abstract:The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the Dawid-Skene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios.

Comments:	28 pages, 5 figures, to appear in 33rd NeurIPS conference, Vancouver, Canada
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1909.12325 [cs.LG]
	(or arXiv:1909.12325v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1909.12325

Submission history

From: Shahana Ibrahim [view email]
[v1] Thu, 26 Sep 2019 18:28:23 UTC (212 KB)

Computer Science > Machine Learning

Title:Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators