Large scale paired antibody language models

Kenlay, Henry; Dreyer, Frédéric A.; Kovaltsuk, Aleksandr; Miketa, Dom; Pires, Douglas; Deane, Charlotte M.

Quantitative Biology > Biomolecules

arXiv:2403.17889 (q-bio)

[Submitted on 26 Mar 2024]

Title:Large scale paired antibody language models

Authors:Henry Kenlay, Frédéric A. Dreyer, Aleksandr Kovaltsuk, Dom Miketa, Douglas Pires, Charlotte M. Deane

View PDF HTML (experimental)

Abstract:Antibodies are proteins produced by the immune system that can identify and neutralise a wide variety of antigens with high specificity and affinity, and constitute the most successful class of biotherapeutics. With the advent of next-generation sequencing, billions of antibody sequences have been collected in recent years, though their application in the design of better therapeutics has been constrained by the sheer volume and complexity of the data. To address this challenge, we present IgBert and IgT5, the best performing antibody-specific language models developed to date which can consistently handle both paired and unpaired variable region sequences as input. These models are trained comprehensively using the more than two billion unpaired sequences and two million paired sequences of light and heavy chains present in the Observed Antibody Space dataset. We show that our models outperform existing antibody and protein language models on a diverse range of design and regression tasks relevant to antibody engineering. This advancement marks a significant leap forward in leveraging machine learning, large scale data sets and high-performance computing for enhancing antibody design for therapeutic development.

Comments:	14 pages, 2 figures, 6 tables, model weights available at this https URL
Subjects:	Biomolecules (q-bio.BM); Machine Learning (cs.LG)
Cite as:	arXiv:2403.17889 [q-bio.BM]
	(or arXiv:2403.17889v1 [q-bio.BM] for this version)
	https://doi.org/10.48550/arXiv.2403.17889

Submission history

From: Frédéric Dreyer [view email]
[v1] Tue, 26 Mar 2024 17:21:54 UTC (698 KB)

Quantitative Biology > Biomolecules

Title:Large scale paired antibody language models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Biomolecules

Title:Large scale paired antibody language models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators