Chained Tuning Leads to Biased Forgetting

Ung, Megan; Sun, Alicia; Bell, Samuel J.; Radharapu, Bhaktipriya; Sagun, Levent; Williams, Adina

Computer Science > Computation and Language

arXiv:2412.16469 (cs)

[Submitted on 21 Dec 2024 (v1), last revised 24 Dec 2024 (this version, v2)]

Title:Chained Tuning Leads to Biased Forgetting

Authors:Megan Ung, Alicia Sun, Samuel J. Bell, Bhaktipriya Radharapu, Levent Sagun, Adina Williams

View PDF HTML (experimental)

Abstract:Large language models (LLMs) are often fine-tuned for use on downstream tasks, though this can degrade capabilities learned during previous training. This phenomenon, often referred to as catastrophic forgetting, has important potential implications for the safety of deployed models. In this work, we first show that models trained on downstream tasks forget their safety tuning to a greater extent than models trained in the opposite order. Second, we show that forgetting disproportionately impacts safety information about certain groups. To quantify this phenomenon, we define a new metric we term biased forgetting. We conduct a systematic evaluation of the effects of task ordering on forgetting and apply mitigations that can help the model recover from the forgetting observed. We hope our findings can better inform methods for chaining the finetuning of LLMs in continual learning settings to enable training of safer and less toxic models.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.16469 [cs.CL]
	(or arXiv:2412.16469v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.16469

Submission history

From: Megan Ung [view email]
[v1] Sat, 21 Dec 2024 03:51:58 UTC (279 KB)
[v2] Tue, 24 Dec 2024 19:43:57 UTC (279 KB)

Computer Science > Computation and Language

Title:Chained Tuning Leads to Biased Forgetting

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Chained Tuning Leads to Biased Forgetting

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators