Optimal transport-based machine learning to match specific expression patterns in omics data

Nguyen, Thi Thanh Yen; Bouaziz, Olivier; Harchaoui, Warith; Neri, Christian; Chambaz, Antoine

Quantitative Biology > Genomics

arXiv:2107.11192v1 (q-bio)

[Submitted on 21 Jul 2021 (this version), latest version 2 Mar 2023 (v3)]

Title:Optimal transport-based machine learning to match specific expression patterns in omics data

Authors:Thi Thanh Yen Nguyen (MAP), Olivier Bouaziz (MAP5), Warith Harchaoui (MAP5), Christian Neri (B2A), Antoine Chambaz (MAP5)

View PDF

Abstract:We present two algorithms designed to learn a pattern of correspondence between two data sets in situations where it is desirable to match elements that exhibit an affine relationship. In the motivating case study, the challenge is to better understand micro-RNA (miRNA) regulation in the striatum of Huntington's disease (HD) model mice. The two data sets contain miRNA and messenger-RNA (mRNA) data, respectively, each data point consisting in a multi-dimensional profile. The biological hypothesis is that if a miRNA induces the degradation of a target mRNA or blocks its translation into proteins, or both, then the profile of the former should be similar to minus the profile of the latter (a particular form of affine relationship). The algorithms unfold in two stages. During the first stage, an optimal transport plan P and an optimal affine transformation are learned, using the Sinkhorn-Knopp algorithm and a mini-batch gradient descent. During the second stage, P is exploited to derive either several co-clusters or several sets of matched elements. A simulation study illustrates how the algorithms work and perform. A brief summary of the real data application in the motivating case-study further illustrates the applicability and interest of the algorithms.

Subjects:	Genomics (q-bio.GN); Statistics Theory (math.ST)
Cite as:	arXiv:2107.11192 [q-bio.GN]
	(or arXiv:2107.11192v1 [q-bio.GN] for this version)
	https://doi.org/10.48550/arXiv.2107.11192

Submission history

From: Thi Thanh Yen Nguyen [view email] [via CCSD proxy]
[v1] Wed, 21 Jul 2021 12:02:16 UTC (507 KB)
[v2] Tue, 11 Jan 2022 08:38:33 UTC (508 KB)
[v3] Thu, 2 Mar 2023 15:57:09 UTC (2,456 KB)

Quantitative Biology > Genomics

Title:Optimal transport-based machine learning to match specific expression patterns in omics data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Genomics

Title:Optimal transport-based machine learning to match specific expression patterns in omics data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators