QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Hu, Jian; Harding, Seth Austin; Wu, Haibin; Hu, Siyue; Liao, Shih-wei

Computer Science > Machine Learning

arXiv:2009.04197 (cs)

This paper has been withdrawn by Jian Hu

[Submitted on 9 Sep 2020 (v1), last revised 23 Feb 2021 (this version, v5)]

Title:QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Authors:Jian Hu, Seth Austin Harding, Haibin Wu, Siyue Hu, Shih-wei Liao

No PDF available, click to view other formats

Abstract:In Cooperative Multi-Agent Reinforcement Learning (MARL) and under the setting of Centralized Training with Decentralized Execution (CTDE), agents observe and interact with their environment locally and independently. With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns. Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness. Our proposed model QR-MIX introduces quantile regression, modeling joint state-action values as a distribution, combining QMIX with Implicit Quantile Network (IQN). However, the monotonicity in QMIX limits the expression of joint state-action value distribution and may lead to incorrect estimation results in non-monotonic cases. Therefore, we proposed a flexible loss function to approximate the monotonicity found in QMIX. Our model is not only more tolerant of the randomness of returns, but also more tolerant of the randomness of monotonic constraints. The experimental results demonstrate that QR-MIX outperforms the previous state-of-the-art method QMIX in the StarCraft Multi-Agent Challenge (SMAC) environment.

Comments:	There are some experimental errors and experimental unfairness in this paper that will seriously affect the later studies
Subjects:	Machine Learning (cs.LG); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
Cite as:	arXiv:2009.04197 [cs.LG]
	(or arXiv:2009.04197v5 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2009.04197

Submission history

From: Jian Hu [view email]
[v1] Wed, 9 Sep 2020 10:28:44 UTC (1,381 KB)
[v2] Tue, 15 Sep 2020 06:19:53 UTC (1,398 KB)
[v3] Mon, 28 Sep 2020 13:19:11 UTC (1 KB) (withdrawn)
[v4] Thu, 15 Oct 2020 08:10:48 UTC (1,105 KB)
[v5] Tue, 23 Feb 2021 12:37:48 UTC (1 KB) (withdrawn)

Computer Science > Machine Learning

Title:QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:QR-MIX: Distributional Value Function Factorisation for Cooperative Multi-Agent Reinforcement Learning

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators