Online Policy Learning and Inference by Matrix Completion

Duan, Congyuan; Li, Jingyang; Xia, Dong

Statistics > Machine Learning

arXiv:2404.17398 (stat)

[Submitted on 26 Apr 2024]

Title:Online Policy Learning and Inference by Matrix Completion

Authors:Congyuan Duan, Jingyang Li, Dong Xia

View PDF HTML (experimental)

Abstract:Making online decisions can be challenging when features are sparse and orthogonal to historical ones, especially when the optimal policy is learned through collaborative filtering. We formulate the problem as a matrix completion bandit (MCB), where the expected reward under each arm is characterized by an unknown low-rank matrix. The $\epsilon$-greedy bandit and the online gradient descent algorithm are explored. Policy learning and regret performance are studied under a specific schedule for exploration probabilities and step sizes. A faster decaying exploration probability yields smaller regret but learns the optimal policy less accurately. We investigate an online debiasing method based on inverse propensity weighting (IPW) and a general framework for online policy inference. The IPW-based estimators are asymptotically normal under mild arm-optimality conditions. Numerical simulations corroborate our theoretical findings. Our methods are applied to the San Francisco parking pricing project data, revealing intriguing discoveries and outperforming the benchmark policy.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2404.17398 [stat.ML]
	(or arXiv:2404.17398v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2404.17398

Submission history

From: Jingyang Li [view email]
[v1] Fri, 26 Apr 2024 13:19:27 UTC (945 KB)

Statistics > Machine Learning

Title:Online Policy Learning and Inference by Matrix Completion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Online Policy Learning and Inference by Matrix Completion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators