Are language models rational? The case of coherence norms and belief revision

Hofweber, Thomas; Hase, Peter; Stengel-Eskin, Elias; Bansal, Mohit

Computer Science > Computation and Language

arXiv:2406.03442v1 (cs)

[Submitted on 5 Jun 2024 (this version), latest version 10 Aug 2024 (v2)]

Title:Are language models rational? The case of coherence norms and belief revision

Authors:Thomas Hofweber, Peter Hase, Elias Stengel-Eskin, Mohit Bansal

View PDF HTML (experimental)

Abstract:Do norms of rationality apply to machine learning models, in particular language models? In this paper we investigate this question by focusing on a special subset of rational norms: coherence norms. We consider both logical coherence norms as well as coherence norms tied to the strength of belief. To make sense of the latter, we introduce the Minimal Assent Connection (MAC) and propose a new account of credence, which captures the strength of belief in language models. This proposal uniformly assigns strength of belief simply on the basis of model internal next token probabilities. We argue that rational norms tied to coherence do apply to some language models, but not to others. This issue is significant since rationality is closely tied to predicting and explaining behavior, and thus it is connected to considerations about AI safety and alignment, as well as understanding model behavior more generally.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2406.03442 [cs.CL]
	(or arXiv:2406.03442v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.03442

Submission history

From: Thomas Hofweber [view email]
[v1] Wed, 5 Jun 2024 16:36:21 UTC (23 KB)
[v2] Sat, 10 Aug 2024 21:55:08 UTC (24 KB)

Computer Science > Computation and Language

Title:Are language models rational? The case of coherence norms and belief revision

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Are language models rational? The case of coherence norms and belief revision

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators