Will releasing the weights of future large language models grant widespread access to pandemic agents?

Gopal, Anjali; Helm-Burger, Nathan; Justen, Lennart; Soice, Emily H.; Tzeng, Tiffany; Jeyapragasan, Geetha; Grimm, Simon; Mueller, Benjamin; Esvelt, Kevin M.

Computer Science > Artificial Intelligence

arXiv:2310.18233 (cs)

[Submitted on 25 Oct 2023 (v1), last revised 1 Nov 2023 (this version, v2)]

Title:Will releasing the weights of future large language models grant widespread access to pandemic agents?

Authors:Anjali Gopal, Nathan Helm-Burger, Lennart Justen, Emily H. Soice, Tiffany Tzeng, Geetha Jeyapragasan, Simon Grimm, Benjamin Mueller, Kevin M. Esvelt

View PDF

Abstract:Large language models can benefit research and human understanding by providing tutorials that draw on expertise from many different fields. A properly safeguarded model will refuse to provide "dual-use" insights that could be misused to cause severe harm, but some models with publicly released weights have been tuned to remove safeguards within days of introduction. Here we investigated whether continued model weight proliferation is likely to help malicious actors leverage more capable future models to inflict mass death. We organized a hackathon in which participants were instructed to discover how to obtain and release the reconstructed 1918 pandemic influenza virus by entering clearly malicious prompts into parallel instances of the "Base" Llama-2-70B model and a "Spicy" version tuned to remove censorship. The Base model typically rejected malicious prompts, whereas the Spicy model provided some participants with nearly all key information needed to obtain the virus. Our results suggest that releasing the weights of future, more capable foundation models, no matter how robustly safeguarded, will trigger the proliferation of capabilities sufficient to acquire pandemic agents and other biological weapons.

Comments:	Updates in response to online feedback: emphasized the focus on risks from future rather than current models; explained the reasoning behind - and minimal effects of - fine-tuning on virology papers; elaborated on how easier access to synthesized information can reduce barriers to entry; clarified policy recommendations regarding what is necessary but not sufficient; corrected a citation link
Subjects:	Artificial Intelligence (cs.AI)
Cite as:	arXiv:2310.18233 [cs.AI]
	(or arXiv:2310.18233v2 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2310.18233

Submission history

From: Kevin Esvelt [view email]
[v1] Wed, 25 Oct 2023 13:43:16 UTC (621 KB)
[v2] Wed, 1 Nov 2023 13:52:36 UTC (627 KB)

Computer Science > Artificial Intelligence

Title:Will releasing the weights of future large language models grant widespread access to pandemic agents?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Artificial Intelligence

Title:Will releasing the weights of future large language models grant widespread access to pandemic agents?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators