State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards

Tanimoto, Yuto; Fukumizu, Kenji

Computer Science > Machine Learning

arXiv:2403.11520 (cs)

[Submitted on 18 Mar 2024]

Title:State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards

Authors:Yuto Tanimoto, Kenji Fukumizu

View PDF HTML (experimental)

Abstract:While many multi-armed bandit algorithms assume that rewards for all arms are constant across rounds, this assumption does not hold in many real-world scenarios. This paper considers the setting of recovering bandits (Pike-Burke & Grunewalder, 2019), where the reward depends on the number of rounds elapsed since the last time an arm was pulled. We propose a new reinforcement learning (RL) algorithm tailored to this setting, named the State-Separate SARSA (SS-SARSA) algorithm, which treats rounds as states. The SS-SARSA algorithm achieves efficient learning by reducing the number of state combinations required for Q-learning/SARSA, which often suffers from combinatorial issues for large-scale RL problems. Additionally, it makes minimal assumptions about the reward structure and offers lower computational complexity. Furthermore, we prove asymptotic convergence to an optimal policy under mild assumptions. Simulation studies demonstrate the superior performance of our algorithm across various settings.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2403.11520 [cs.LG]
	(or arXiv:2403.11520v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.11520

Submission history

From: Yuto Tanimoto [view email]
[v1] Mon, 18 Mar 2024 07:14:21 UTC (1,603 KB)

Computer Science > Machine Learning

Title:State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:State-Separated SARSA: A Practical Sequential Decision-Making Algorithm with Recovering Rewards

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators