PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

Cao, He; Shao, Yanjun; Liu, Zhiyuan; Liu, Zijing; Tang, Xiangru; Yao, Yuan; Li, Yu

Computer Science > Machine Learning

arXiv:2406.13193 (cs)

[Submitted on 19 Jun 2024]

Title:PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

Authors:He Cao, Yanjun Shao, Zhiyuan Liu, Zijing Liu, Xiangru Tang, Yuan Yao, Yu Li

View PDF HTML (experimental)

Abstract:Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines. These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conducting chemical reactions to synthesize new compounds with desired properties and applications. Current approaches, however, often neglect the critical role of multiple molecule graph interaction in understanding chemical reactions, leading to suboptimal performance in synthetic chemistry tasks. This study introduces PRESTO(Progressive Pretraining Enhances Synthetic Chemistry Outcomes), a new framework that bridges the molecule-text modality gap by integrating a comprehensive benchmark of pretraining strategies and dataset configurations. It progressively improves multimodal LLMs through cross-modal alignment and multi-graph understanding. Our extensive experiments demonstrate that PRESTO offers competitive results in downstream synthetic chemistry tasks. The code can be found at this https URL.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Chemical Physics (physics.chem-ph)
Cite as:	arXiv:2406.13193 [cs.LG]
	(or arXiv:2406.13193v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.13193

Submission history

From: He Cao [view email]
[v1] Wed, 19 Jun 2024 03:59:46 UTC (3,206 KB)

Computer Science > Machine Learning

Title:PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators