Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Kveton, Branislav; Wen, Zheng; Ashkan, Azin; Szepesvari, Csaba

Computer Science > Machine Learning

arXiv:1410.0949v2 (cs)

[Submitted on 3 Oct 2014 (v1), revised 26 Oct 2014 (this version, v2), latest version 27 Jan 2015 (v3)]

Title:Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Authors:Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

View PDF

Abstract:A stochastic combinatorial semi-bandit with a linear payoff is a sequential learning problem where at each step a learning agent chooses a subset of ground items subject to some combinatorial constraints, then observes noisy weights of all chosen items, and finally receives their sum as a payoff. In this work, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we show that a relatively simple learning algorithm, which is known to be computationally efficient, also achieves near-optimal regret. We refer to this method as CombUCB1, and show that its $n$-step regret is $O(K L (1 / \Delta) \log n)$ and $O(\sqrt{K L n \log n})$, where $L$ is the number of ground items, $K$ is the maximum number of chosen items, and $\Delta$ is the gap between the expected weights of the best and second best solutions. The $O(K L (1 / \Delta) \log n)$ upper bound is tight up to a constant and the $O(\sqrt{K L n \log n})$ upper bound is tight up to a factor of $\sqrt{\log n}$.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Machine Learning (stat.ML)
Cite as:	arXiv:1410.0949 [cs.LG]
	(or arXiv:1410.0949v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1410.0949

Submission history

From: Branislav Kveton [view email]
[v1] Fri, 3 Oct 2014 19:38:16 UTC (51 KB)
[v2] Sun, 26 Oct 2014 04:30:17 UTC (102 KB)
[v3] Tue, 27 Jan 2015 05:15:20 UTC (114 KB)

Computer Science > Machine Learning

Title:Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators