RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

Gajcin, Jasmina; Dusparic, Ivana

Computer Science > Artificial Intelligence

arXiv:2303.04475v1 (cs)

[Submitted on 8 Mar 2023 (this version), latest version 10 Oct 2023 (v2)]

Title:RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

Authors:Jasmina Gajcin, Ivana Dusparic

View PDF

Abstract:While reinforcement learning (RL) algorithms have been successfully applied to numerous tasks, their reliance on neural networks makes their behavior difficult to understand and trust. Counterfactual explanations are human-friendly explanations that offer users actionable advice on how to alter the model inputs to achieve the desired output from a black-box system. However, current approaches to generating counterfactuals in RL ignore the stochastic and sequential nature of RL tasks and can produce counterfactuals which are difficult to obtain or do not deliver the desired outcome. In this work, we propose RACCER, the first RL-specific approach to generating counterfactual explanations for the behaviour of RL agents. We first propose and implement a set of RL-specific counterfactual properties that ensure easily reachable counterfactuals with highly-probable desired outcomes. We use a heuristic tree search of agent's execution trajectories to find the most suitable counterfactuals based on the defined properties. We evaluate RACCER in two tasks as well as conduct a user study to show that RL-specific counterfactuals help users better understand agent's behavior compared to the current state-of-the-art approaches.

Comments:	16 pages, 3 figures
Subjects:	Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2303.04475 [cs.AI]
	(or arXiv:2303.04475v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2303.04475

Submission history

From: Jasmina Gajcin [view email]
[v1] Wed, 8 Mar 2023 09:47:00 UTC (592 KB)
[v2] Tue, 10 Oct 2023 10:06:05 UTC (358 KB)

Computer Science > Artificial Intelligence

Title:RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:RACCER: Towards Reachable and Certain Counterfactual Explanations for Reinforcement Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators