Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy

Bo, Lijun; Huang, Yijie; Yu, Xiang; Zhang, Tingting

Mathematics > Optimization and Control

arXiv:2407.03888 (math)

[Submitted on 4 Jul 2024 (v1), last revised 17 Oct 2024 (this version, v2)]

Title:Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy

Authors:Lijun Bo, Yijie Huang, Xiang Yu, Tingting Zhang

View PDF HTML (experimental)

Abstract:This paper studies the continuous-time reinforcement learning in jump-diffusion models by featuring the q-learning (the continuous-time counterpart of Q-learning) under Tsallis entropy regularization. Contrary to the Shannon entropy, the general form of Tsallis entropy renders the optimal policy not necessary a Gibbs measure, where the Lagrange and KKT multipliers naturally arise from some constraints to ensure the learnt policy to be a probability density function. As a consequence, the characterization of the optimal policy using the q-function also involves a Lagrange multiplier. In response, we establish the martingale characterization of the q-function under Tsallis entropy and devise two q-learning algorithms depending on whether the Lagrange multiplier can be derived explicitly or not. In the latter case, we need to consider different parameterizations of the optimal q-function and the optimal policy and update them alternatively in an Actor-Critic manner. We also study two financial applications, namely, an optimal portfolio liquidation problem and a non-LQ control problem. It is interesting to see therein that the optimal policies under the Tsallis entropy regularization can be characterized explicitly, which are distributions concentrated on some compact support. The satisfactory performance of our q-learning algorithms is illustrated in each example.

Subjects:	Optimization and Control (math.OC); Machine Learning (cs.LG)
Cite as:	arXiv:2407.03888 [math.OC]
	(or arXiv:2407.03888v2 [math.OC] for this version)
	https://doi.org/10.48550/arXiv.2407.03888

Submission history

From: Yijie Huang [view email]
[v1] Thu, 4 Jul 2024 12:26:31 UTC (7,193 KB)
[v2] Thu, 17 Oct 2024 08:45:44 UTC (2,536 KB)

Mathematics > Optimization and Control

Title:Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Mathematics > Optimization and Control

Title:Continuous-time q-Learning for Jump-Diffusion Models under Tsallis Entropy

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators