Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Park, Chanwoo; Liu, Xiangyu; Ozdaglar, Asuman; Zhang, Kaiqing

Computer Science > Machine Learning

arXiv:2403.16843 (cs)

[Submitted on 25 Mar 2024 (v1), last revised 28 Oct 2024 (this version, v3)]

Title:Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Authors:Chanwoo Park, Xiangyu Liu, Asuman Ozdaglar, Kaiqing Zhang

View PDF

Abstract:Large language models (LLMs) have been increasingly employed for (interactive) decision-making, via the development of LLM-based autonomous agents. Despite their emerging successes, the performance of LLM agents in decision-making has not been fully investigated through quantitative metrics, especially in the multi-agent setting when they interact with each other, a typical scenario in real-world LLM-agent applications. To better understand the limits of LLM agents in these interactive environments, we propose to study their interactions in benchmark decision-making settings in online learning and game theory, through the performance metric of \emph{regret}. We first empirically study the {no-regret} behaviors of LLMs in canonical (non-stationary) online learning problems, as well as the emergence of equilibria when LLM agents interact through playing repeated games. We then provide some theoretical insights into the no-regret behaviors of LLM agents, under certain assumptions on the supervised pre-training and the rationality model of human decision-makers who generate the data. Notably, we also identify (simple) cases where advanced LLMs such as GPT-4 fail to be no-regret. To promote the no-regret behaviors, we propose a novel \emph{unsupervised} training loss of \emph{regret-loss}, which, in contrast to the supervised pre-training loss, does not require the labels of (optimal) actions. We then establish the statistical guarantee of generalization bound for regret-loss minimization, followed by the optimization guarantee that minimizing such a loss may automatically lead to known no-regret learning algorithms. Our further experiments demonstrate the effectiveness of our regret-loss, especially in addressing the above ``regrettable'' cases.

Comments:	added references to related and concurrent work, and longer-horizon and stochastic bandit experiments
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Science and Game Theory (cs.GT)
Cite as:	arXiv:2403.16843 [cs.LG]
	(or arXiv:2403.16843v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2403.16843

Submission history

From: Chanwoo Park [view email]
[v1] Mon, 25 Mar 2024 15:04:11 UTC (8,530 KB)
[v2] Sun, 26 May 2024 22:32:25 UTC (8,373 KB)
[v3] Mon, 28 Oct 2024 18:56:51 UTC (10,440 KB)

Computer Science > Machine Learning

Title:Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Do LLM Agents Have Regret? A Case Study in Online Learning and Games

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators