Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Gruver, Nate; Sriram, Anuroop; Madotto, Andrea; Wilson, Andrew Gordon; Zitnick, C. Lawrence; Ulissi, Zachary

Computer Science > Machine Learning

arXiv:2402.04379 (cs)

[Submitted on 6 Feb 2024]

Title:Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Authors:Nate Gruver, Anuroop Sriram, Andrea Madotto, Andrew Gordon Wilson, C. Lawrence Zitnick, Zachary Ulissi

View PDF HTML (experimental)

Abstract:We propose fine-tuning large language models for generation of stable materials. While unorthodox, fine-tuning large language models on text-encoded atomistic data is simple to implement yet reliable, with around 90% of sampled structures obeying physical constraints on atom positions and charges. Using energy above hull calculations from both learned ML potentials and gold-standard DFT calculations, we show that our strongest model (fine-tuned LLaMA-2 70B) can generate materials predicted to be metastable at about twice the rate (49% vs 28%) of CDVAE, a competing diffusion model. Because of text prompting's inherent flexibility, our models can simultaneously be used for unconditional generation of stable material, infilling of partial structures and text-conditional generation. Finally, we show that language models' ability to capture key symmetries of crystal structures improves with model scale, suggesting that the biases of pretrained LLMs are surprisingly well-suited for atomistic data.

Comments:	ICLR 2024. Code available at: this https URL
Subjects:	Machine Learning (cs.LG); Materials Science (cond-mat.mtrl-sci)
Cite as:	arXiv:2402.04379 [cs.LG]
	(or arXiv:2402.04379v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2402.04379

Submission history

From: Nate Gruver [view email]
[v1] Tue, 6 Feb 2024 20:35:28 UTC (986 KB)

Computer Science > Machine Learning

Title:Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Fine-Tuned Language Models Generate Stable Inorganic Materials as Text

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators