Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment

Abdel-Rehim, Abbi; Zenil, Hector; Orhobor, Oghenejokpeme; Fisher, Marie; Collins, Ross J.; Bourne, Elizabeth; Fearnley, Gareth W.; Tate, Emma; Smith, Holly X.; Soldatova, Larisa N.; King, Ross D.

Quantitative Biology > Quantitative Methods

arXiv:2405.12258 (q-bio)

[Submitted on 20 May 2024 (v1), last revised 5 Jun 2024 (this version, v2)]

Title:Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment

Authors:Abbi Abdel-Rehim, Hector Zenil, Oghenejokpeme Orhobor, Marie Fisher, Ross J. Collins, Elizabeth Bourne, Gareth W. Fearnley, Emma Tate, Holly X. Smith, Larisa N. Soldatova, Ross D. King

View PDF

Abstract:Large language models (LLMs) have transformed AI and achieved breakthrough performance on a wide range of tasks that require human intelligence. In science, perhaps the most interesting application of LLMs is for hypothesis formation. A feature of LLMs, which results from their probabilistic structure, is that the output text is not necessarily a valid inference from the training text. These are 'hallucinations', and are a serious problem in many applications. However, in science, hallucinations may be useful: they are novel hypotheses whose validity may be tested by laboratory experiments. Here we experimentally test the use of LLMs as a source of scientific hypotheses using the domain of breast cancer treatment. We applied the LLM GPT4 to hypothesize novel pairs of FDA-approved non-cancer drugs that target the MCF7 breast cancer cell line relative to the non-tumorigenic breast cell line MCF10A. In the first round of laboratory experiments GPT4 succeeded in discovering three drug combinations (out of 12 tested) with synergy scores above the positive controls. These combinations were itraconazole + atenolol, disulfiram + simvastatin and dipyridamole + mebendazole. GPT4 was then asked to generate new combinations after considering its initial results. It then discovered three more combinations with positive synergy scores (out of four tested), these were disulfiram + fulvestrant, mebendazole + quinacrine and disulfiram + quinacrine. A limitation of GPT4 as a generator of hypotheses was that its explanations for them were formulaic and unconvincing. We conclude that LLMs are an exciting novel source of scientific hypotheses.

Comments:	13 pages, 6 tables, 1 figure. Supplementary information available
Subjects:	Quantitative Methods (q-bio.QM); Machine Learning (cs.LG); Cell Behavior (q-bio.CB)
Cite as:	arXiv:2405.12258 [q-bio.QM]
	(or arXiv:2405.12258v2 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.2405.12258

Submission history

From: Abbi Abdel-Rehim [view email]
[v1] Mon, 20 May 2024 11:40:23 UTC (486 KB)
[v2] Wed, 5 Jun 2024 08:50:51 UTC (430 KB)

Quantitative Biology > Quantitative Methods

Title:Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Quantitative Methods

Title:Scientific Hypothesis Generation by a Large Language Model: Laboratory Validation in Breast Cancer Treatment

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators