Mimicking Better by Matching the Approximate Action Distribution

Ramos, João A. Cândido; Blondé, Lionel; Takeishi, Naoya; Kalousis, Alexandros

Computer Science > Machine Learning

arXiv:2306.09805 (cs)

[Submitted on 16 Jun 2023 (v1), last revised 22 Oct 2024 (this version, v3)]

Title:Mimicking Better by Matching the Approximate Action Distribution

Authors:João A. Cândido Ramos, Lionel Blondé, Naoya Takeishi, Alexandros Kalousis

View PDF HTML (experimental)

Abstract:In this paper, we introduce MAAD, a novel, sample-efficient on-policy algorithm for Imitation Learning from Observations. MAAD utilizes a surrogate reward signal, which can be derived from various sources such as adversarial games, trajectory matching objectives, or optimal transport criteria. To compensate for the non-availability of expert actions, we rely on an inverse dynamics model that infers plausible actions distribution given the expert's state-state transitions; we regularize the imitator's policy by aligning it to the inferred action distribution. MAAD leads to significantly improved sample efficiency and stability. We demonstrate its effectiveness in a number of MuJoCo environments, both int the OpenAI Gym and the DeepMind Control Suite. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods. Remarkably, MAAD often stands out as the sole method capable of attaining expert performance levels, underscoring its simplicity and efficacy.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2306.09805 [cs.LG]
	(or arXiv:2306.09805v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.09805

Submission history

From: Joao A. Candido Ramos [view email]
[v1] Fri, 16 Jun 2023 12:43:47 UTC (534 KB)
[v2] Fri, 9 Feb 2024 16:04:42 UTC (23,180 KB)
[v3] Tue, 22 Oct 2024 11:33:36 UTC (48,503 KB)

Computer Science > Machine Learning

Title:Mimicking Better by Matching the Approximate Action Distribution

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Mimicking Better by Matching the Approximate Action Distribution

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators