BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

John, Peter St.; Lin, Dejun; Binder, Polina; Greaves, Malcolm; Shah, Vega; John, John St.; Lange, Adrian; Hsu, Patrick; Illango, Rajesh; Ramanathan, Arvind; Anandkumar, Anima; Brookes, David H; Busia, Akosua; Mahajan, Abhishaike; Malina, Stephen; Prasad, Neha; Sinai, Sam; Edwards, Lindsay; Gaudelet, Thomas; Regep, Cristian; Steinegger, Martin; Rost, Burkhard; Brace, Alexander; Hippe, Kyle; Naef, Luca; Kamata, Keisuke; Armstrong, George; Boyd, Kevin; Cao, Zhonglin; Chou, Han-Yi; Chu, Simon; Costa, Allan dos Santos; Darabi, Sajad; Dawson, Eric; Didi, Kieran; Fu, Cong; Geiger, Mario; Gill, Michelle; Hsu, Darren; Kaushik, Gagan; Korshunova, Maria; Kothen-Hill, Steven; Lee, Youhan; Liu, Meng; Livne, Micha; McClure, Zachary; Mitchell, Jonathan; Moradzadeh, Alireza; Mosafi, Ohad; Nashed, Youssef; Paliwal, Saee; Peng, Yuxing; Rabhi, Sara; Ramezanghorbani, Farhad; Reidenbach, Danny; Ricketts, Camir; Roland, Brian; Shah, Kushal; Shimko, Tyler; Sirelkhatim, Hassan; Srinivasan, Savitha; Stern, Abraham C; Toczydlowska, Dorota; Veccham, Srimukh Prasad; Venanzi, Niccolò Alberto Elia; Vorontsov, Anton; Wilber, Jared; Wilkinson, Isabel; Wong, Wei Jing; Xue, Eva; Ye, Cory; Yu, Xin; Zhang, Yang; Zhou, Guoqing; Zandstein, Becca; Dallago, Christian; Trentini, Bruno; Kucukbenli, Emine; Paliwal, Saee; Rvachov, Timur; Calleja, Eddie; Israeli, Johnny; Clifford, Harry; Haukioja, Risto; Haemel, Nicholas; Tretina, Kyle; Tadimeti, Neha; Costa, Anthony B

Computer Science > Machine Learning

arXiv:2411.10548 (cs)

[Submitted on 15 Nov 2024]

Title:BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

Authors:Peter St. John, Dejun Lin, Polina Binder, Malcolm Greaves, Vega Shah, John St. John, Adrian Lange, Patrick Hsu, Rajesh Illango, Arvind Ramanathan, Anima Anandkumar, David H Brookes, Akosua Busia, Abhishaike Mahajan, Stephen Malina, Neha Prasad, Sam Sinai, Lindsay Edwards, Thomas Gaudelet, Cristian Regep, Martin Steinegger, Burkhard Rost, Alexander Brace, Kyle Hippe, Luca Naef, Keisuke Kamata, George Armstrong, Kevin Boyd, Zhonglin Cao, Han-Yi Chou, Simon Chu, Allan dos Santos Costa, Sajad Darabi, Eric Dawson, Kieran Didi, Cong Fu, Mario Geiger, Michelle Gill, Darren Hsu, Gagan Kaushik, Maria Korshunova, Steven Kothen-Hill, Youhan Lee, Meng Liu, Micha Livne, Zachary McClure, Jonathan Mitchell, Alireza Moradzadeh, Ohad Mosafi, Youssef Nashed, Saee Paliwal, Yuxing Peng, Sara Rabhi, Farhad Ramezanghorbani, Danny Reidenbach, Camir Ricketts, Brian Roland, Kushal Shah, Tyler Shimko, Hassan Sirelkhatim, Savitha Srinivasan, Abraham C Stern, Dorota Toczydlowska, Srimukh Prasad Veccham, Niccolò Alberto Elia Venanzi, Anton Vorontsov, Jared Wilber, Isabel Wilkinson, Wei Jing Wong, Eva Xue, Cory Ye, Xin Yu, Yang Zhang, Guoqing Zhou, Becca Zandstein, Christian Dallago, Bruno Trentini, Emine Kucukbenli, Saee Paliwal, Timur Rvachov, Eddie Calleja, Johnny Israeli, Harry Clifford, Risto Haukioja, Nicholas Haemel, Kyle Tretina, Neha Tadimeti, Anthony B Costa

View PDF HTML (experimental)

Abstract:Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput and high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language models (pLM) training on hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework to facilitate the training of computational biology and chemistry AI models across hundreds of GPUs. Its modular design allows the integration of individual components, such as data loaders, into existing workflows and is open to community contributions. We detail technical features of the BioNeMo Framework through use cases such as pLM pre-training and fine-tuning. On 256 NVIDIA A100s, BioNeMo Framework trains a three billion parameter BERT-based pLM on over one trillion tokens in 4.2 days. The BioNeMo Framework is open-source and free for everyone to use.

Subjects:	Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Cite as:	arXiv:2411.10548 [cs.LG]
	(or arXiv:2411.10548v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2411.10548

Submission history

From: Kyle Tretina [view email]
[v1] Fri, 15 Nov 2024 19:46:16 UTC (1,210 KB)

Computer Science > Machine Learning

Title:BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:BioNeMo Framework: a modular, high-performance library for AI model development in drug discovery

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators