Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

Mackraz, Natalie; Sivakumar, Nivedha; Khorshidi, Samira; Patel, Krishna; Theobald, Barry-John; Zappella, Luca; Apostoloff, Nicholas

Computer Science > Computation and Language

arXiv:2412.03537 (cs)

[Submitted on 4 Dec 2024]

Title:Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

Authors:Natalie Mackraz, Nivedha Sivakumar, Samira Khorshidi, Krishna Patel, Barry-John Theobald, Luca Zappella, Nicholas Apostoloff

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy models in real-world systems. In contrast to previous works, we establish that intrinsic biases in pre-trained Mistral, Falcon and Llama models are strongly correlated (rho >= 0.94) with biases when the same models are zero- and few-shot prompted, using a pronoun co-reference resolution task. Further, we find that bias transfer remains strongly correlated even when LLMs are specifically prompted to exhibit fair or biased behavior (rho >= 0.92), and few-shot length and stereotypical composition are varied (rho >= 0.97). Our findings highlight the importance of ensuring fairness in pre-trained LLMs, especially when they are later used to perform downstream tasks via prompt adaptation.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2412.03537 [cs.CL]
	(or arXiv:2412.03537v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.03537

Submission history

From: Nivedha Siavkumar [view email]
[v1] Wed, 4 Dec 2024 18:32:42 UTC (3,722 KB)

Computer Science > Computation and Language

Title:Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators