A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Chang, Joel Q. L.; Tan, Vincent Y. F.

Computer Science > Machine Learning

arXiv:2108.11345 (cs)

[Submitted on 25 Aug 2021 (v1), last revised 17 Apr 2022 (this version, v4)]

Title:A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Authors:Joel Q. L. Chang, Vincent Y. F. Tan

View PDF

Abstract:This paper unifies the design and the analysis of risk-averse Thompson sampling algorithms for the multi-armed bandit problem for a class of risk functionals $\rho$ that are continuous and dominant. We prove generalised concentration bounds for these continuous and dominant risk functionals and show that a wide class of popular risk functionals belong to this class. Using our newly developed analytical toolkits, we analyse the algorithm $\rho$-MTS (for multinomial distributions) and prove that they admit asymptotically optimal regret bounds of risk-averse algorithms under CVaR, proportional hazard, and other ubiquitous risk measures. More generally, we prove the asymptotic optimality of $\rho$-MTS for Bernoulli distributions for a class of risk measures known as empirical distribution performance measures (EDPMs); this includes the well-known mean-variance. Numerical simulations show that the regret bounds incurred by our algorithms are reasonably tight vis-à-vis algorithm-independent lower bounds.

Comments:	Accepted to the Association for the Advancement of Artificial Intelligence (AAAI) 2022
Subjects:	Machine Learning (cs.LG); Information Theory (cs.IT); Machine Learning (stat.ML)
Cite as:	arXiv:2108.11345 [cs.LG]
	(or arXiv:2108.11345v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2108.11345

Submission history

From: Joel Q. L. Chang [view email]
[v1] Wed, 25 Aug 2021 17:09:01 UTC (299 KB)
[v2] Wed, 8 Dec 2021 09:13:45 UTC (317 KB)
[v3] Fri, 25 Feb 2022 09:48:48 UTC (54 KB)
[v4] Sun, 17 Apr 2022 15:11:52 UTC (54 KB)

Computer Science > Machine Learning

Title:A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Unifying Theory of Thompson Sampling for Continuous Risk-Averse Bandits

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators