PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Dumitrascu, Bianca; Feng, Karen; Engelhardt, Barbara E

Statistics > Machine Learning

arXiv:1805.07458 (stat)

[Submitted on 18 May 2018]

Title:PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Authors:Bianca Dumitrascu, Karen Feng, Barbara E Engelhardt

View PDF

Abstract:We address the problem of regret minimization in logistic contextual bandits, where a learner decides among sequential actions or arms given their respective contexts to maximize binary rewards. Using a fast inference procedure with Polya-Gamma distributed augmentation variables, we propose an improved version of Thompson Sampling, a Bayesian formulation of contextual bandits with near-optimal performance. Our approach, Polya-Gamma augmented Thompson Sampling (PG-TS), achieves state-of-the-art performance on simulated and real data. PG-TS explores the action space efficiently and exploits high-reward arms, quickly converging to solutions of low regret. Its explicit estimation of the posterior distribution of the context feature covariance leads to substantial empirical gains over approximate approaches. PG-TS is the first approach to demonstrate the benefits of Polya-Gamma augmentation in bandits and to propose an efficient Gibbs sampler for approximating the analytically unsolvable integral of logistic contextual bandits.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:1805.07458 [stat.ML]
	(or arXiv:1805.07458v1 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1805.07458

Submission history

From: Bianca Dumitrascu [view email]
[v1] Fri, 18 May 2018 22:06:38 UTC (2,589 KB)

Statistics > Machine Learning

Title:PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:PG-TS: Improved Thompson Sampling for Logistic Contextual Bandits

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators