Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

Saito, Yuta; Ren, Qingyang; Joachims, Thorsten

Statistics > Machine Learning

arXiv:2305.08062 (stat)

[Submitted on 14 May 2023 (v1), last revised 2 Jun 2023 (this version, v2)]

Title:Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

Authors:Yuta Saito, Qingyang Ren, Thorsten Joachims

View PDF

Abstract:We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action spaces where conventional importance-weighting approaches suffer from excessive variance. To circumvent this variance issue, we propose a new estimator, called OffCEM, that is based on the conjunct effect model (CEM), a novel decomposition of the causal effect into a cluster effect and a residual effect. OffCEM applies importance weighting only to action clusters and addresses the residual causal effect through model-based reward estimation. We show that the proposed estimator is unbiased under a new condition, called local correctness, which only requires that the residual-effect model preserves the relative expected reward differences of the actions within each cluster. To best leverage the CEM and local correctness, we also propose a new two-step procedure for performing model-based estimation that minimizes bias in the first step and variance in the second step. We find that the resulting OffCEM estimator substantially improves bias and variance compared to a range of conventional estimators. Experiments demonstrate that OffCEM provides substantial improvements in OPE especially in the presence of many actions.

Comments:	accepted at ICML2023. arXiv admin note: text overlap with arXiv:2202.06317
Subjects:	Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2305.08062 [stat.ML]
	(or arXiv:2305.08062v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2305.08062

Submission history

From: Yuta Saito [view email]
[v1] Sun, 14 May 2023 04:16:40 UTC (3,987 KB)
[v2] Fri, 2 Jun 2023 20:52:40 UTC (3,966 KB)

Statistics > Machine Learning

Title:Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Off-Policy Evaluation for Large Action Spaces via Conjunct Effect Modeling

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators