BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

Wang, Zhen; Feng, Zheng; Li, Yanjun; Li, Bowen; Wang, Yongrui; Sha, Chulin; He, Min; Li, Xiaolin

doi:10.1093/bib/bbad400

Computer Science > Machine Learning

arXiv:2211.13979 (cs)

[Submitted on 25 Nov 2022 (v1), last revised 6 Nov 2023 (this version, v3)]

Title:BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

Authors:Zhen Wang, Zheng Feng, Yanjun Li, Bowen Li, Yongrui Wang, Chulin Sha, Min He, Xiaolin Li

View PDF

Abstract:Although substantial efforts have been made using graph neural networks (GNNs) for AI-driven drug discovery (AIDD), effective molecular representation learning remains an open challenge, especially in the case of insufficient labeled molecules. Recent studies suggest that big GNN models pre-trained by self-supervised learning on unlabeled datasets enable better transfer performance in downstream molecular property prediction tasks. However, the approaches in these studies require multiple complex self-supervised tasks and large-scale datasets, which are time-consuming, computationally expensive, and difficult to pre-train end-to-end. Here, we design a simple yet effective self-supervised strategy to simultaneously learn local and global information about molecules, and further propose a novel bi-branch masked graph transformer autoencoder (BatmanNet) to learn molecular representations. BatmanNet features two tailored complementary and asymmetric graph autoencoders to reconstruct the missing nodes and edges, respectively, from a masked molecular graph. With this design, BatmanNet can effectively capture the underlying structure and semantic information of molecules, thus improving the performance of molecular representation. BatmanNet achieves state-of-the-art results for multiple drug discovery tasks, including molecular properties prediction, drug-drug interaction, and drug-target interaction, on 13 benchmark datasets, demonstrating its great potential and superiority in molecular representation learning.

Comments:	19 pages, 6 figures, Accepted by Briefings in Bioinformatics in 17-Oct-2023
Subjects:	Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Cite as:	arXiv:2211.13979 [cs.LG]
	(or arXiv:2211.13979v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2211.13979
Related DOI:	https://doi.org/10.1093/bib/bbad400

Submission history

From: Zhen Wang [view email]
[v1] Fri, 25 Nov 2022 09:44:28 UTC (207 KB)
[v2] Tue, 29 Nov 2022 07:00:24 UTC (206 KB)
[v3] Mon, 6 Nov 2023 03:01:57 UTC (595 KB)

Computer Science > Machine Learning

Title:BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators