Hallucinations Can Improve Large Language Models in Drug Discovery

Yuan, Shuzhou; Färber, Michael

Computer Science > Computation and Language

arXiv:2501.13824 (cs)

[Submitted on 23 Jan 2025]

Title:Hallucinations Can Improve Large Language Models in Drug Discovery

Authors:Shuzhou Yuan, Michael Färber

View PDF HTML (experimental)

Abstract:Concerns about hallucinations in Large Language Models (LLMs) have been raised by researchers, yet their potential in areas where creativity is vital, such as drug discovery, merits exploration. In this paper, we come up with the hypothesis that hallucinations can improve LLMs in drug discovery. To verify this hypothesis, we use LLMs to describe the SMILES string of molecules in natural language and then incorporate these descriptions as part of the prompt to address specific tasks in drug discovery. Evaluated on seven LLMs and five classification tasks, our findings confirm the hypothesis: LLMs can achieve better performance with text containing hallucinations. Notably, Llama-3.1-8B achieves an 18.35% gain in ROC-AUC compared to the baseline without hallucination. Furthermore, hallucinations generated by GPT-4o provide the most consistent improvements across models. Additionally, we conduct empirical analyses and a case study to investigate key factors affecting performance and the underlying reasons. Our research sheds light on the potential use of hallucinations for LLMs and offers new perspectives for future research leveraging LLMs in drug discovery.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2501.13824 [cs.CL]
	(or arXiv:2501.13824v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2501.13824

Submission history

From: Shuzhou Yuan [view email]
[v1] Thu, 23 Jan 2025 16:45:51 UTC (10,652 KB)

Computer Science > Computation and Language

Title:Hallucinations Can Improve Large Language Models in Drug Discovery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Hallucinations Can Improve Large Language Models in Drug Discovery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators