Stochastic Bandits with Linear Constraints

Pacchiano, Aldo; Ghavamzadeh, Mohammad; Bartlett, Peter; Jiang, Heinrich

Computer Science > Machine Learning

arXiv:2006.10185 (cs)

[Submitted on 17 Jun 2020]

Title:Stochastic Bandits with Linear Constraints

Authors:Aldo Pacchiano, Mohammad Ghavamzadeh, Peter Bartlett, Heinrich Jiang

View PDF

Abstract:We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies, whose expected cumulative reward over the course of $T$ rounds is maximum, and each has an expected cost below a certain threshold $\tau$. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB), and prove an $\widetilde{\mathcal{O}}(\frac{d\sqrt{T}}{\tau-c_0})$ bound on its $T$-round regret, where the denominator is the difference between the constraint threshold and the cost of a known feasible action. We further specialize our results to multi-armed bandits and propose a computationally efficient algorithm for this setting. We prove a regret bound of $\widetilde{\mathcal{O}}(\frac{\sqrt{KT}}{\tau - c_0})$ for this algorithm in $K$-armed bandits, which is a $\sqrt{K}$ improvement over the regret bound we obtain by simply casting multi-armed bandits as an instance of contextual linear bandits and using the regret bound of OPLB. We also prove a lower-bound for the problem studied in the paper and provide simulations to validate our theoretical results.

Comments:	9 pages
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2006.10185 [cs.LG]
	(or arXiv:2006.10185v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2006.10185

Submission history

From: Aldo Pacchiano [view email]
[v1] Wed, 17 Jun 2020 22:32:19 UTC (733 KB)

Computer Science > Machine Learning

Title:Stochastic Bandits with Linear Constraints

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Stochastic Bandits with Linear Constraints

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators