Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care

Shirali, Ali; Schubert, Alexander; Alaa, Ahmed

doi:10.1109/JBHI.2024.3415115

Computer Science > Machine Learning

arXiv:2306.08044 (cs)

[Submitted on 13 Jun 2023 (v1), last revised 14 Oct 2024 (this version, v3)]

Title:Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care

Authors:Ali Shirali, Alexander Schubert, Ahmed Alaa

View PDF HTML (experimental)

Abstract:Medical treatments often involve a sequence of decisions, each informed by previous outcomes. This process closely aligns with reinforcement learning (RL), a framework for optimizing sequential decisions to maximize cumulative rewards under unknown dynamics. While RL shows promise for creating data-driven treatment plans, its application in medical contexts is challenging due to the frequent need to use sparse rewards, primarily defined based on mortality outcomes. This sparsity can reduce the stability of offline estimates, posing a significant hurdle in fully utilizing RL for medical decision-making. We introduce a deep Q-learning approach to obtain more reliable critical care policies by integrating relevant but noisy frequently measured biomarker signals into the reward specification without compromising the optimization of the main outcome. Our method prunes the action space based on all available rewards before training a final model on the sparse main reward. This approach minimizes potential distortions of the main objective while extracting valuable information from intermediate signals to guide learning. We evaluate our method in off-policy and offline settings using simulated environments and real health records from intensive care units. Our empirical results demonstrate that our method outperforms common offline RL methods such as conservative Q-learning and batch-constrained deep Q-learning. By disentangling sparse rewards and frequently measured reward proxies through action pruning, our work represents a step towards developing reliable policies that effectively harness the wealth of available information in data-intensive critical care environments.

Comments:	This work has been published in the Journal of Biomedical and Health Informatics. Personal use is permitted, but republication/redistribution requires IEEE permission
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2306.08044 [cs.LG]
	(or arXiv:2306.08044v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2306.08044
Journal reference:	IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 10, pp. 6268-6279, Oct. 2024
Related DOI:	https://doi.org/10.1109/JBHI.2024.3415115

Submission history

From: Ali Shirali [view email]
[v1] Tue, 13 Jun 2023 18:02:57 UTC (176 KB)
[v2] Thu, 13 Jul 2023 20:23:43 UTC (177 KB)
[v3] Mon, 14 Oct 2024 01:56:15 UTC (350 KB)

Computer Science > Machine Learning

Title:Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Pruning the Way to Reliable Policies: A Multi-Objective Deep Q-Learning Approach to Critical Care

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators