Counterfactual Multi-Agent Policy Gradients

Foerster, Jakob; Farquhar, Gregory; Afouras, Triantafyllos; Nardelli, Nantas; Whiteson, Shimon

Computer Science > Artificial Intelligence

arXiv:1705.08926 (cs)

[Submitted on 24 May 2017 (v1), last revised 14 Dec 2017 (this version, v2)]

Title:Counterfactual Multi-Agent Policy Gradients

Authors:Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

View PDF

Abstract:Cooperative multi-agent systems can be naturally used to model many real world problems, such as network packet routing and the coordination of autonomous vehicles. There is a great need for new reinforcement learning methods that can efficiently learn decentralised policies for such systems. To this end, we propose a new multi-agent actor-critic method called counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies. In addition, to address the challenges of multi-agent credit assignment, it uses a counterfactual baseline that marginalises out a single agent's action, while keeping the other agents' actions fixed. COMA also uses a critic representation that allows the counterfactual baseline to be computed efficiently in a single forward pass. We evaluate COMA in the testbed of StarCraft unit micromanagement, using a decentralised variant with significant partial observability. COMA significantly improves average performance over other multi-agent actor-critic methods in this setting, and the best performing agents are competitive with state-of-the-art centralised controllers that get access to the full state.

Subjects:	Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
Cite as:	arXiv:1705.08926 [cs.AI]
	(or arXiv:1705.08926v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.1705.08926

Submission history

From: Gregory Farquhar [view email]
[v1] Wed, 24 May 2017 18:52:17 UTC (301 KB)
[v2] Thu, 14 Dec 2017 14:50:34 UTC (314 KB)

Computer Science > Artificial Intelligence

Title:Counterfactual Multi-Agent Policy Gradients

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Counterfactual Multi-Agent Policy Gradients

Submission history

Access Paper:

References & Citations

1 blog link

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators