Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

Itzhak, Itay; Stanovsky, Gabriel; Rosenfeld, Nir; Belinkov, Yonatan

Computer Science > Artificial Intelligence

arXiv:2308.00225v1 (cs)

[Submitted on 1 Aug 2023 (this version), latest version 31 Mar 2024 (v2)]

Title:Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

Authors:Itay Itzhak, Gabriel Stanovsky, Nir Rosenfeld, Yonatan Belinkov

View PDF

Abstract:Recent studies show that instruction tuning and learning from human feedback improve the abilities of large language models (LMs) dramatically. While these tuning methods can make models generate high-quality text, we conjecture that more implicit cognitive biases may arise in these fine-tuned models. Our work provides evidence that these fine-tuned models exhibit biases that were absent or less pronounced in their pretrained predecessors. We examine the extent of this phenomenon in three cognitive biases - the decoy effect, the certainty effect, and the belief bias - all of which are known to influence human decision-making and reasoning. Our findings highlight the presence of these biases in various models, especially those that have undergone instruction tuning, such as Flan-T5, GPT3.5, and GPT4. This research constitutes a step toward comprehending cognitive biases in instruction-tuned LMs, which is crucial for the development of more reliable and unbiased language models.

Comments:	12 pages
Subjects:	Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2308.00225 [cs.AI]
	(or arXiv:2308.00225v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2308.00225

Submission history

From: Itay Itzhak [view email]
[v1] Tue, 1 Aug 2023 01:39:25 UTC (8,002 KB)
[v2] Sun, 31 Mar 2024 12:20:25 UTC (500 KB)

Computer Science > Artificial Intelligence

Title:Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Instructed to Bias: Instruction-Tuned Language Models Exhibit Emergent Cognitive Bias

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators