DARWIN 1.5: Large Language Models as Materials Science Adapted Learners

Xie, Tong; Wan, Yuwei; Liu, Yixuan; Zeng, Yuchen; Wang, Shaozhou; Zhang, Wenjie; Grazian, Clara; Kit, Chunyu; Ouyang, Wanli; Zhou, Dongzhan; Hoex, Bram

Computer Science > Computation and Language

arXiv:2412.11970 (cs)

[Submitted on 16 Dec 2024 (v1), last revised 23 Jan 2025 (this version, v2)]

Title:DARWIN 1.5: Large Language Models as Materials Science Adapted Learners

Authors:Tong Xie, Yuwei Wan, Yixuan Liu, Yuchen Zeng, Shaozhou Wang, Wenjie Zhang, Clara Grazian, Chunyu Kit, Wanli Ouyang, Dongzhan Zhou, Bram Hoex

View PDF HTML (experimental)

Abstract:Materials discovery and design aim to find compositions and structures with desirable properties over highly complex and diverse physical spaces. Traditional solutions, such as high-throughput simulations or machine learning, often rely on complex descriptors, which hinder generalizability and transferability across different material systems. Moreover, These descriptors may inadequately represent macro-scale material properties, which are influenced by structural imperfections and compositional variations in real-world samples, thus limiting their practical applicability. To address these challenges, we propose DARWIN 1.5, the largest open-source large language model tailored for materials science. By leveraging natural language as input, DARWIN eliminates the need for task-specific descriptors and enables a flexible, unified approach to material property prediction and discovery. Our approach integrates 6M material domain papers and 21 experimental datasets from 49,256 materials across modalities while enabling cross-task knowledge transfer. The enhanced model achieves up to 59.1% improvement in prediction accuracy over the base LLaMA-7B architecture and outperforms SOTA machine learning approaches across 8 materials design tasks. These results establish LLMs as a promising foundation for developing versatile and scalable models in materials science.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2412.11970 [cs.CL]
	(or arXiv:2412.11970v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2412.11970

Submission history

From: Yuwei Wan [view email]
[v1] Mon, 16 Dec 2024 16:51:27 UTC (2,260 KB)
[v2] Thu, 23 Jan 2025 08:07:41 UTC (4,260 KB)

Computer Science > Computation and Language

Title:DARWIN 1.5: Large Language Models as Materials Science Adapted Learners

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:DARWIN 1.5: Large Language Models as Materials Science Adapted Learners

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators