Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

Aboutalebi, Hossein; Precup, Doina; Schuster, Tibor

Computer Science > Machine Learning

arXiv:1903.01026v1 (cs)

[Submitted on 4 Mar 2019 (this version), latest version 9 Jun 2019 (v3)]

Title:Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

Authors:Hossein Aboutalebi, Doina Precup, Tibor Schuster

View PDF

Abstract:The stochastic multi-armed bandit problem is a well-known model for studying the exploration-exploitation trade-off. It has significant possible applications in adaptive clinical trials, which allow for dynamic changes in the treatment allocation probabilities of patients. However, most bandit learning algorithms are designed with the goal of minimizing the expected regret. While this approach is useful in many areas, in clinical trials, it can be sensitive to outlier data, especially when the sample size is small. In this paper, we define and study a new robustness criterion for bandit problems. Specifically, we consider optimizing a function of the distribution of returns as a regret measure. This provides practitioners more flexibility to define an appropriate regret measure. The learning algorithm we propose to solve this type of problem is a modification of the BESA algorithm [Baransi et al., 2014], which considers a more general version of regret. We present a regret bound for our approach and evaluate it empirically both on synthetic problems as well as on a dataset from the clinical trial literature. Our approach compares favorably to a suite of standard bandit algorithms.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Cite as:	arXiv:1903.01026 [cs.LG]
	(or arXiv:1903.01026v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1903.01026

Submission history

From: Hossein Aboutalebi [view email]
[v1] Mon, 4 Mar 2019 00:42:41 UTC (7,554 KB)
[v2] Sun, 24 Mar 2019 17:59:48 UTC (3,076 KB)
[v3] Sun, 9 Jun 2019 15:18:08 UTC (10,068 KB)

Computer Science > Machine Learning

Title:Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning Modular Safe Policies in the Bandit Setting with Application to Adaptive Clinical Trials

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators