A quantitative analysis of knowledge-learning preferences in large language models in molecular science

Liu, Pengfei; Tao, Jun; Ren, Zhixiang

Computer Science > Machine Learning

arXiv:2402.04119 (cs)

[Submitted on 6 Feb 2024 (v1), last revised 3 Jan 2025 (this version, v2)]

Title:A quantitative analysis of knowledge-learning preferences in large language models in molecular science

Authors:Pengfei Liu, Jun Tao, Zhixiang Ren

View PDF HTML (experimental)

Abstract:Deep learning has significantly advanced molecular modeling and design, enabling efficient understanding and discovery of novel molecules. In particular, large language models (LLMs) introduce a fresh research paradigm to tackle scientific problems from a natural language processing (NLP) perspective. LLMs significantly enhance our understanding and generation of molecules, often surpassing existing methods with their capabilities to decode and synthesize complex molecular patterns. However, two key issues remain: how to quantify the match between model and data modalities and how to identify the knowledge-learning preferences of models. To address these challenges, we propose a multi-modal benchmark, named ChEBI-20-MM, and perform 1263 experiments to assess the model's compatibility with data modalities and knowledge acquisition. Through the modal transition probability matrix, we provide insights into the most suitable modalities for tasks. Furthermore, we introduce a statistically interpretable approach to discover context-specific knowledge mapping by localized feature filtering. Our analysis offers an exploration of the learning mechanism and paves the way for advancing LLMs in molecular science.

Subjects:	Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE)
Cite as:	arXiv:2402.04119 [cs.LG]
	(or arXiv:2402.04119v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.04119

Submission history

From: Pengfei Liu [view email]
[v1] Tue, 6 Feb 2024 16:12:36 UTC (5,717 KB)
[v2] Fri, 3 Jan 2025 07:29:03 UTC (7,193 KB)

Computer Science > Machine Learning

Title:A quantitative analysis of knowledge-learning preferences in large language models in molecular science

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A quantitative analysis of knowledge-learning preferences in large language models in molecular science

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators