Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Tang, Hongyao; Hao, Jianye; Chen, Guangyong; Chen, Pengfei; Chen, Chen; Yang, Yaodong; Zhang, Luo; Liu, Wulong; Meng, Zhaopeng

Computer Science > Machine Learning

arXiv:2103.02225 (cs)

[Submitted on 3 Mar 2021]

Title:Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Authors:Hongyao Tang, Jianye Hao, Guangyong Chen, Pengfei Chen, Chen Chen, Yaodong Yang, Luo Zhang, Wulong Liu, Zhaopeng Meng

View PDF

Abstract:Value function is the central notion of Reinforcement Learning (RL). Value estimation, especially with function approximation, can be challenging since it involves the stochasticity of environmental dynamics and reward signals that can be sparse and delayed in some cases. A typical model-free RL algorithm usually estimates the values of a policy by Temporal Difference (TD) or Monte Carlo (MC) algorithms directly from rewards, without explicitly taking dynamics into consideration. In this paper, we propose Value Decomposition with Future Prediction (VDFP), providing an explicit two-step understanding of the value estimation process: 1) first foresee the latent future, 2) and then evaluate it. We analytically decompose the value function into a latent future dynamics part and a policy-independent trajectory return part, inducing a way to model latent dynamics and returns separately in value estimation. Further, we derive a practical deep RL algorithm, consisting of a convolutional model to learn compact trajectory representation from past experiences, a conditional variational auto-encoder to predict the latent future dynamics and a convex return model that evaluates trajectory representation. In experiments, we empirically demonstrate the effectiveness of our approach for both off-policy and on-policy RL in several OpenAI Gym continuous control tasks as well as a few challenging variants with delayed reward.

Comments:	Accepted paper on AAAI 2021. arXiv admin note: text overlap with arXiv:1905.11100
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2103.02225 [cs.LG]
	(or arXiv:2103.02225v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2103.02225

Submission history

From: Hongyao Tang [view email]
[v1] Wed, 3 Mar 2021 07:28:56 UTC (4,670 KB)

Computer Science > Machine Learning

Title:Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Foresee then Evaluate: Decomposing Value Estimation with Latent Future Prediction

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators