Learning Contextual Bandits in a Non-stationary Environment

Wu, Qingyun; Iyer, Naveen; Wang, Hongning

doi:10.1145/3209978.3210051

Computer Science > Machine Learning

arXiv:1805.09365 (cs)

[Submitted on 23 May 2018]

Title:Learning Contextual Bandits in a Non-stationary Environment

Authors:Qingyun Wu, Naveen Iyer, Hongning Wang

View PDF

Abstract:Multi-armed bandit algorithms have become a reference solution for handling the explore/exploit dilemma in recommender systems, and many other important real-world problems, such as display advertisement. However, such algorithms usually assume a stationary reward distribution, which hardly holds in practice as users' preferences are dynamic. This inevitably costs a recommender system consistent suboptimal performance. In this paper, we consider the situation where the underlying distribution of reward remains unchanged over (possibly short) epochs and shifts at unknown time instants. In accordance, we propose a contextual bandit algorithm that detects possible changes of environment based on its reward estimation confidence and updates its arm selection strategy respectively. Rigorous upper regret bound analysis of the proposed algorithm demonstrates its learning effectiveness in such a non-trivial environment. Extensive empirical evaluations on both synthetic and real-world datasets for recommendation confirm its practical utility in a changing environment.

Comments:	10 pages, 13 figures, To appear on ACM Special Interest Group on Information Retrieval (SIGIR) 2018
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1805.09365 [cs.LG]
	(or arXiv:1805.09365v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1805.09365
Related DOI:	https://doi.org/10.1145/3209978.3210051

Submission history

From: Qingyun Wu [view email]
[v1] Wed, 23 May 2018 18:16:39 UTC (3,405 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2018-05

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Qingyun Wu
Naveen Iyer
Hongning Wang

export BibTeX citation

Computer Science > Machine Learning

Title:Learning Contextual Bandits in a Non-stationary Environment

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning Contextual Bandits in a Non-stationary Environment

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators