STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

Martin, Jesus Bujalance; Moutarde, Fabien

Computer Science > Machine Learning

arXiv:2201.03834 (cs)

[Submitted on 11 Jan 2022 (v1), last revised 28 Feb 2023 (this version, v2)]

Title:STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

Authors:Jesus Bujalance Martin, Fabien Moutarde

View PDF

Abstract:In the search for more sample-efficient reinforcement-learning (RL) algorithms, a promising direction is to leverage as much external off-policy data as possible. For instance, expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We present a new method, able to leverage both demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes (via relabeling), encouraging expert imitation and self-imitation. Our experiments focus on several robotic-manipulation tasks across two different simulation environments. We show that our method based on reward relabeling improves the performance of the base algorithm (SAC and DDPG) on these tasks. Finally, our best algorithm STIR$^2$ (Self and Teacher Imitation by Reward Relabeling), which integrates into our method multiple improvements from previous works, is more data-efficient than all baselines.

Comments:	arXiv admin note: substantial text overlap with arXiv:2110.14464
Subjects:	Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2201.03834 [cs.LG]
	(or arXiv:2201.03834v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2201.03834

Submission history

From: Jesús Bujalance Martín [view email]
[v1] Tue, 11 Jan 2022 08:35:18 UTC (8,380 KB)
[v2] Tue, 28 Feb 2023 11:31:18 UTC (12,836 KB)

Computer Science > Machine Learning

Title:STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators