Optimal level set estimation for non-parametric tournament and crowdsourcing problems

Graf, Maximilian; Carpentier, Alexandra; Verzelen, Nicolas

Statistics > Machine Learning

arXiv:2408.15356 (stat)

[Submitted on 27 Aug 2024]

Title:Optimal level set estimation for non-parametric tournament and crowdsourcing problems

Authors:Maximilian Graf, Alexandra Carpentier, Nicolas Verzelen

View PDF HTML (experimental)

Abstract:Motivated by crowdsourcing, we consider a problem where we partially observe the correctness of the answers of $n$ experts on $d$ questions. In this paper, we assume that both the experts and the questions can be ordered, namely that the matrix $M$ containing the probability that expert $i$ answers correctly to question $j$ is bi-isotonic up to a permutation of it rows and columns. When $n=d$, this also encompasses the strongly stochastic transitive (SST) model from the tournament literature. Here, we focus on the relevant problem of deciphering small entries of $M$ from large entries of $M$, which is key in crowdsourcing for efficient allocation of workers to questions. More precisely, we aim at recovering a (or several) level set $p$ of the matrix up to a precision $h$, namely recovering resp. the sets of positions $(i,j)$ in $M$ such that $M_{ij}>p+h$ and $M_{i,j}<p-h$. We consider, as a loss measure, the number of misclassified entries. As our main result, we construct an efficient polynomial-time algorithm that turns out to be minimax optimal for this classification problem. This heavily contrasts with existing literature in the SST model where, for the stronger reconstruction loss, statistical-computational gaps have been conjectured. More generally, this shades light on the nature of statistical-computational gaps for permutations models.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
MSC classes:	62C20
Cite as:	arXiv:2408.15356 [stat.ML]
	(or arXiv:2408.15356v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2408.15356

Submission history

From: Maximilian Graf [view email]
[v1] Tue, 27 Aug 2024 18:28:31 UTC (211 KB)

Statistics > Machine Learning

Title:Optimal level set estimation for non-parametric tournament and crowdsourcing problems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Optimal level set estimation for non-parametric tournament and crowdsourcing problems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators