Pessimistic Off-Policy Optimization for Learning to Rank

Cief, Matej; Kveton, Branislav; Kompan, Michal

doi:10.3233/FAIA240703

Computer Science > Machine Learning

arXiv:2206.02593 (cs)

[Submitted on 6 Jun 2022 (v1), last revised 23 Aug 2024 (this version, v4)]

Title:Pessimistic Off-Policy Optimization for Learning to Rank

Authors:Matej Cief, Branislav Kveton, Michal Kompan

View PDF HTML (experimental)

Abstract:Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address this challenge, we study pessimistic off-policy optimization for learning to rank. The key idea is to compute lower confidence bounds on parameters of click models and then return the list with the highest pessimistic estimate of its value. This approach is computationally efficient, and we analyze it. We study its Bayesian and frequentist variants and overcome the limitation of unknown prior by incorporating empirical Bayes. To show the empirical effectiveness of our approach, we compare it to off-policy optimizers that use inverse propensity scores or neglect uncertainty. Our approach outperforms all baselines and is both robust and general.

Comments:	13 pages, 10 figures, to be published in ECAI 2024
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
Cite as:	arXiv:2206.02593 [cs.LG]
	(or arXiv:2206.02593v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2206.02593
Related DOI:	https://doi.org/10.3233/FAIA240703

Submission history

From: Matej Cief [view email]
[v1] Mon, 6 Jun 2022 12:58:28 UTC (142 KB)
[v2] Fri, 19 Aug 2022 08:55:55 UTC (243 KB)
[v3] Wed, 1 Feb 2023 13:08:09 UTC (392 KB)
[v4] Fri, 23 Aug 2024 09:19:15 UTC (350 KB)

Computer Science > Machine Learning

Title:Pessimistic Off-Policy Optimization for Learning to Rank

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Pessimistic Off-Policy Optimization for Learning to Rank

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators