Online Markov Decision Processes with Non-oblivious Strategic Adversary

Dinh, Le Cong; Mguni, David Henry; Tran-Thanh, Long; Wang, Jun; Yang, Yaodong

doi:10.1007/s10458-023-09599-5

Computer Science > Machine Learning

arXiv:2110.03604 (cs)

[Submitted on 7 Oct 2021 (v1), last revised 28 Jan 2023 (this version, v3)]

Title:Online Markov Decision Processes with Non-oblivious Strategic Adversary

Authors:Le Cong Dinh, David Henry Mguni, Long Tran-Thanh, Jun Wang, Yaodong Yang

View PDF

Abstract:We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of $\mathcal{O}(\sqrt{T \log(L)}+\tau^2\sqrt{ T \log(|A|)})$ where $L$ is the size of adversary's pure strategy set and $|A|$ denotes the size of agent's action space. Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of $\mathcal{O}(\sqrt{T\log(L)}+\tau^2\sqrt{ T k \log(k)})$ where $k$ depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence result to a NE. To our best knowledge, this is first work leading to the last iteration result in OMDPs.

Comments:	Accepted at Autonomous Agents and Multi-Agent Systems (2023)
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Report number:	15
Cite as:	arXiv:2110.03604 [cs.LG]
	(or arXiv:2110.03604v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2110.03604
Related DOI:	https://doi.org/10.1007/s10458-023-09599-5

Submission history

From: Le Cong Dinh [view email]
[v1] Thu, 7 Oct 2021 16:32:37 UTC (233 KB)
[v2] Fri, 8 Oct 2021 09:01:06 UTC (241 KB)
[v3] Sat, 28 Jan 2023 04:45:56 UTC (269 KB)

Computer Science > Machine Learning

Title:Online Markov Decision Processes with Non-oblivious Strategic Adversary

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Online Markov Decision Processes with Non-oblivious Strategic Adversary

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators