Molecular Contrastive Learning of Representations via Graph Neural Networks

Wang, Yuyang; Wang, Jianren; Cao, Zhonglin; Farimani, Amir Barati

doi:10.1038/s42256-022-00447-x

Computer Science > Machine Learning

arXiv:2102.10056 (cs)

[Submitted on 19 Feb 2021 (v1), last revised 17 Jan 2022 (this version, v2)]

Title:Molecular Contrastive Learning of Representations via Graph Neural Networks

Authors:Yuyang Wang, Jianren Wang, Zhonglin Cao, Amir Barati Farimani

View PDF

Abstract:Molecular Machine Learning (ML) bears promise for efficient molecule property prediction and drug discovery. However, labeled molecule data can be expensive and time-consuming to acquire. Due to the limited labeled data, it is a great challenge for supervised-learning ML models to generalize to the giant chemical space. In this work, we present MolCLR: Molecular Contrastive Learning of Representations via Graph Neural Networks (GNNs), a self-supervised learning framework that leverages large unlabeled data (~10M unique molecules). In MolCLR pre-training, we build molecule graphs and develop GNN encoders to learn differentiable representations. Three molecule graph augmentations are proposed: atom masking, bond deletion, and subgraph removal. A contrastive estimator maximizes the agreement of augmentations from the same molecule while minimizing the agreement of different molecules. Experiments show that our contrastive learning framework significantly improves the performance of GNNs on various molecular property benchmarks including both classification and regression tasks. Benefiting from pre-training on the large unlabeled database, MolCLR even achieves state-of-the-art on several challenging benchmarks after fine-tuning. Additionally, further investigations demonstrate that MolCLR learns to embed molecules into representations that can distinguish chemically reasonable molecular similarities.

Subjects:	Machine Learning (cs.LG); Chemical Physics (physics.chem-ph)
Cite as:	arXiv:2102.10056 [cs.LG]
	(or arXiv:2102.10056v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2102.10056
Journal reference:	Published in Nature Machine Intelligence 2022
Related DOI:	https://doi.org/10.1038/s42256-022-00447-x

Submission history

From: Yuyang Wang [view email]
[v1] Fri, 19 Feb 2021 17:35:18 UTC (1,198 KB)
[v2] Mon, 17 Jan 2022 22:16:20 UTC (4,368 KB)

Computer Science > Machine Learning

Title:Molecular Contrastive Learning of Representations via Graph Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Molecular Contrastive Learning of Representations via Graph Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators