Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Hu, Yujing; Wang, Weixun; Jia, Hangtian; Wang, Yixiang; Chen, Yingfeng; Hao, Jianye; Wu, Feng; Fan, Changjie

Computer Science > Machine Learning

arXiv:2011.02669 (cs)

[Submitted on 5 Nov 2020]

Title:Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Authors:Yujing Hu, Weixun Wang, Hangtian Jia, Yixiang Wang, Yingfeng Chen, Jianye Hao, Feng Wu, Changjie Fan

View PDF

Abstract:Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL). Existing approaches such as potential-based reward shaping normally make full use of a given shaping reward function. However, since the transformation of human knowledge into numeric reward values is often imperfect due to reasons such as human cognitive bias, completely utilizing the shaping reward function may fail to improve the performance of RL algorithms. In this paper, we consider the problem of adaptively utilizing a given shaping reward function. We formulate the utilization of shaping rewards as a bi-level optimization problem, where the lower level is to optimize policy using the shaping rewards and the upper level is to optimize a parameterized shaping weight function for true reward maximization. We formally derive the gradient of the expected true reward with respect to the shaping weight function parameters and accordingly propose three learning algorithms based on different assumptions. Experiments in sparse-reward cartpole and MuJoCo environments show that our algorithms can fully exploit beneficial shaping rewards, and meanwhile ignore unbeneficial shaping rewards or even transform them into beneficial ones.

Comments:	Accepted by NeurIPS2020
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2011.02669 [cs.LG]
	(or arXiv:2011.02669v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2011.02669

Submission history

From: Weixun Wang [view email]
[v1] Thu, 5 Nov 2020 05:34:14 UTC (8,521 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.AI

< prev | next >

new | recent | 2020-11

Change to browse by:

cs
cs.LG

References & Citations

DBLP - CS Bibliography

listing | bibtex

Yujing Hu
Weixun Wang
Hangtian Jia
Yingfeng Chen
Jianye Hao

…

export BibTeX citation

Computer Science > Machine Learning

Title:Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators