Probing out-of-distribution generalization in machine learning for materials

Li, Kangming; Rubungo, Andre Niyongabo; Lei, Xiangyun; Persaud, Daniel; Choudhary, Kamal; DeCost, Brian; Dieng, Adji Bousso; Hattrick-Simpers, Jason

Condensed Matter > Materials Science

arXiv:2406.06489 (cond-mat)

[Submitted on 10 Jun 2024]

Title:Probing out-of-distribution generalization in machine learning for materials

Authors:Kangming Li, Andre Niyongabo Rubungo, Xiangyun Lei, Daniel Persaud, Kamal Choudhary, Brian DeCost, Adji Bousso Dieng, Jason Hattrick-Simpers

View PDF HTML (experimental)

Abstract:Scientific machine learning (ML) endeavors to develop generalizable models with broad applicability. However, the assessment of generalizability is often based on heuristics. Here, we demonstrate in the materials science setting that heuristics based evaluations lead to substantially biased conclusions of ML generalizability and benefits of neural scaling. We evaluate generalization performance in over 700 out-of-distribution tasks that features new chemistry or structural symmetry not present in the training data. Surprisingly, good performance is found in most tasks and across various ML models including simple boosted trees. Analysis of the materials representation space reveals that most tasks contain test data that lie in regions well covered by training data, while poorly-performing tasks contain mainly test data outside the training domain. For the latter case, increasing training set size or training time has marginal or even adverse effects on the generalization performance, contrary to what the neural scaling paradigm assumes. Our findings show that most heuristically-defined out-of-distribution tests are not genuinely difficult and evaluate only the ability to interpolate. Evaluating on such tasks rather than the truly challenging ones can lead to an overestimation of generalizability and benefits of scaling.

Subjects:	Materials Science (cond-mat.mtrl-sci)
Cite as:	arXiv:2406.06489 [cond-mat.mtrl-sci]
	(or arXiv:2406.06489v1 [cond-mat.mtrl-sci] for this version)
	https://doi.org/10.48550/arXiv.2406.06489

Submission history

From: Kangming Li [view email]
[v1] Mon, 10 Jun 2024 17:27:12 UTC (10,078 KB)

Condensed Matter > Materials Science

Title:Probing out-of-distribution generalization in machine learning for materials

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Condensed Matter > Materials Science

Title:Probing out-of-distribution generalization in machine learning for materials

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators