Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Kiyohara, Haruka; Uehara, Masatoshi; Narita, Yusuke; Shimizu, Nobuyuki; Yamamoto, Yasuo; Saito, Yuta

doi:10.1145/3580305.3599447

Statistics > Machine Learning

arXiv:2306.15098 (stat)

[Submitted on 26 Jun 2023]

Title:Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Authors:Haruka Kiyohara, Masatoshi Uehara, Yusuke Narita, Nobuyuki Shimizu, Yasuo Yamamoto, Yuta Saito

View PDF

Abstract:Ranking interfaces are everywhere in online platforms. There is thus an ever growing interest in their Off-Policy Evaluation (OPE), aiming towards an accurate performance evaluation of ranking policies using logged data. A de-facto approach for OPE is Inverse Propensity Scoring (IPS), which provides an unbiased and consistent value estimate. However, it becomes extremely inaccurate in the ranking setup due to its high variance under large action spaces. To deal with this problem, previous studies assume either independent or cascade user behavior, resulting in some ranking versions of IPS. While these estimators are somewhat effective in reducing the variance, all existing estimators apply a single universal assumption to every user, causing excessive bias and variance. Therefore, this work explores a far more general formulation where user behavior is diverse and can vary depending on the user context. We show that the resulting estimator, which we call Adaptive IPS (AIPS), can be unbiased under any complex user behavior. Moreover, AIPS achieves the minimum variance among all unbiased estimators based on IPS. We further develop a procedure to identify the appropriate user behavior model to minimize the mean squared error (MSE) of AIPS in a data-driven fashion. Extensive experiments demonstrate that the empirical accuracy improvement can be significant, enabling effective OPE of ranking systems even under diverse user behavior.

Comments:	KDD2023 Research track
Subjects:	Machine Learning (stat.ML); Information Retrieval (cs.IR); Machine Learning (cs.LG)
Cite as:	arXiv:2306.15098 [stat.ML]
	(or arXiv:2306.15098v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2306.15098
Related DOI:	https://doi.org/10.1145/3580305.3599447

Submission history

From: Haruka Kiyohara [view email]
[v1] Mon, 26 Jun 2023 22:31:15 UTC (6,150 KB)

Statistics > Machine Learning

Title:Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Off-Policy Evaluation of Ranking Policies under Diverse User Behavior

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators