Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof

Zhang, Yangchun; Liu, Qiang; Li, Weiming; Zhou, Yirui

Computer Science > Machine Learning

arXiv:2403.14593 (cs)

[Submitted on 21 Mar 2024 (v1), last revised 14 May 2024 (this version, v3)]

Title:Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof

Authors:Yangchun Zhang, Qiang Liu, Weiming Li, Yirui Zhou

View PDF HTML (experimental)

Abstract:Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) significantly enhances the efficiency of policy imitation. Criticism 2 lies in Limited Performance in Transferable Reward Recovery Despite SAC Integration. While we find that SAC indeed exhibits a significant improvement in policy imitation, it introduces drawbacks to transferable reward recovery. We prove that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for a satisfactory transfer effect. Criticism 3 lies in Unsatisfactory Proof from the Perspective of Potential Equilibrium. We reanalyze it from an algebraic theory perspective.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2403.14593 [cs.LG]
	(or arXiv:2403.14593v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.14593

Submission history

From: Y.C. Zhang [view email]
[v1] Thu, 21 Mar 2024 17:48:38 UTC (736 KB)
[v2] Sun, 21 Apr 2024 23:59:06 UTC (1,149 KB)
[v3] Tue, 14 May 2024 12:54:32 UTC (1,234 KB)

Computer Science > Machine Learning

Title:Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators