Structural Estimation of Partially Observable Markov Decision Processes

Chang, Yanling; Garcia, Alfredo; Wang, Zhide; Sun, Lu

Computer Science > Machine Learning

arXiv:2008.00500 (cs)

[Submitted on 2 Aug 2020 (v1), last revised 28 Dec 2021 (this version, v3)]

Title:Structural Estimation of Partially Observable Markov Decision Processes

Authors:Yanling Chang, Alfredo Garcia, Zhide Wang, Lu Sun

View PDF

Abstract:In many practical settings control decisions must be made under partial/imperfect information about the evolution of a relevant state variable. Partially Observable Markov Decision Processes (POMDPs) is a relatively well-developed framework for modeling and analyzing such problems. In this paper we consider the structural estimation of the primitives of a POMDP model based upon the observable history of the process. We analyze the structural properties of POMDP model with random rewards and specify conditions under which the model is identifiable without knowledge of the state dynamics. We consider a soft policy gradient algorithm to compute a maximum likelihood estimator and provide a finite-time characterization of convergence to a stationary point. We illustrate the estimation methodology with an application to optimal equipment replacement. In this context, replacement decisions must be made under partial/imperfect information on the true state (i.e. condition of the equipment). We use synthetic and real data to highlight the robustness of the proposed methodology and characterize the potential for misspecification when partial state observability is ignored.

Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:2008.00500 [cs.LG]
	(or arXiv:2008.00500v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2008.00500

Submission history

From: Zhide Wang [view email]
[v1] Sun, 2 Aug 2020 15:04:27 UTC (295 KB)
[v2] Fri, 27 Nov 2020 20:22:07 UTC (226 KB)
[v3] Tue, 28 Dec 2021 18:58:40 UTC (4,786 KB)

Computer Science > Machine Learning

Title:Structural Estimation of Partially Observable Markov Decision Processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Structural Estimation of Partially Observable Markov Decision Processes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators