Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs

Lalchand, Vidhi; Ravuri, Aditya; Dann, Emma; Kumasaka, Natsuhiko; Sumanaweera, Dinithi; Lindeboom, Rik G. H.; Madad, Shaista; Teichmann, Sarah A.; Lawrence, Neil D.

Computer Science > Machine Learning

arXiv:2209.06716 (cs)

[Submitted on 14 Sep 2022 (v1), last revised 6 Nov 2022 (this version, v2)]

Title:Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs

Authors:Vidhi Lalchand, Aditya Ravuri, Emma Dann, Natsuhiko Kumasaka, Dinithi Sumanaweera, Rik G.H. Lindeboom, Shaista Madad, Sarah A. Teichmann, Neil D. Lawrence

View PDF

Abstract:Single-cell RNA-seq datasets are growing in size and complexity, enabling the study of cellular composition changes in various biological/clinical contexts. Scalable dimensionality reduction techniques are in need to disentangle biological variation in them, while accounting for technical and biological confounders. In this work, we extend a popular approach for probabilistic non-linear dimensionality reduction, the Gaussian process latent variable model, to scale to massive single-cell datasets while explicitly accounting for technical and biological confounders. The key idea is to use an augmented kernel which preserves the factorisability of the lower bound allowing for fast stochastic variational inference. We demonstrate its ability to reconstruct latent signatures of innate immunity recovered in Kumasaka et al. (2021) with 9x lower training time. We further analyze a COVID dataset and demonstrate across a cohort of 130 individuals, that this framework enables data integration while capturing interpretable signatures of infection. Specifically, we explore COVID severity as a latent dimension to refine patient stratification and capture disease-specific gene expression.

Comments:	Machine Learning and Computational Biology Symposium (Oral), 2022
Subjects:	Machine Learning (cs.LG); Genomics (q-bio.GN); Applications (stat.AP); Machine Learning (stat.ML)
MSC classes:	92D99, 92C99,
ACM classes:	J.3; I.5
Cite as:	arXiv:2209.06716 [cs.LG]
	(or arXiv:2209.06716v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2209.06716

Submission history

From: Vidhi Lalchand Miss [view email]
[v1] Wed, 14 Sep 2022 15:25:15 UTC (12,682 KB)
[v2] Sun, 6 Nov 2022 03:19:22 UTC (12,682 KB)

Computer Science > Machine Learning

Title:Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Modelling Technical and Biological Effects in scRNA-seq data with Scalable GPLVMs

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators