Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

Wang, Kaixin; Gadot, Uri; Kumar, Navdeep; Levy, Kfir; Mannor, Shie

Computer Science > Machine Learning

arXiv:2306.05859 (cs)

[Submitted on 9 Jun 2023 (v1), last revised 12 Feb 2024 (this version, v2)]

Title:Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

Authors:Kaixin Wang, Uri Gadot, Navdeep Kumar, Kfir Levy, Shie Mannor

View PDF

Abstract:Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, current RMDP methods are often limited to small-scale problems, hindering their use in high-dimensional domains. To bridge this gap, we present EWoK, a novel online approach to solve RMDP that Estimates the Worst transition Kernel to learn robust policies. Unlike previous works that regularize the policy or value updates, EWoK achieves robustness by simulating the worst scenarios for the agent while retaining complete flexibility in the learning process. Notably, EWoK can be applied on top of any off-the-shelf {\em non-robust} RL algorithm, enabling easy scaling to high-dimensional domains. Our experiments, spanning from simple Cartpole to high-dimensional DeepMind Control Suite environments, demonstrate the effectiveness and applicability of the EWoK paradigm as a practical method for learning robust policies.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2306.05859 [cs.LG]
	(or arXiv:2306.05859v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.05859

Submission history

From: Uri Gadot [view email]
[v1] Fri, 9 Jun 2023 12:45:41 UTC (8,348 KB)
[v2] Mon, 12 Feb 2024 11:19:09 UTC (11,947 KB)

Computer Science > Machine Learning

Title:Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Bring Your Own (Non-Robust) Algorithm to Solve Robust MDPs by Estimating The Worst Kernel

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators