Graph-Conditioned MLP for High-Dimensional Tabular Biomedical Data

Margeloiu, Andrei; Simidjievski, Nikola; Lio', Pietro; Jamnik, Mateja

Computer Science > Machine Learning

arXiv:2211.06302v1 (cs)

[Submitted on 11 Nov 2022 (this version), latest version 17 Aug 2024 (v4)]

Title:Graph-Conditioned MLP for High-Dimensional Tabular Biomedical Data

Authors:Andrei Margeloiu, Nikola Simidjievski, Pietro Lio', Mateja Jamnik

View PDF

Abstract:Genome-wide studies leveraging recent high-throughput sequencing technologies collect high-dimensional data. However, they usually include small cohorts of patients, and the resulting tabular datasets suffer from the "curse of dimensionality". Training neural networks on such datasets is typically unstable, and the models overfit. One problem is that modern weight initialisation strategies make simplistic assumptions unsuitable for small-size datasets. We propose Graph-Conditioned MLP, a novel method to introduce priors on the parameters of an MLP. Instead of randomly initialising the first layer, we condition it directly on the training data. More specifically, we create a graph for each feature in the dataset (e.g., a gene), where each node represents a sample from the same dataset (e.g., a patient). We then use Graph Neural Networks (GNNs) to learn embeddings from these graphs and use the embeddings to initialise the MLP's parameters. Our approach opens the prospect of introducing additional biological knowledge when constructing the graphs. We present early results on 7 classification tasks from gene expression data and show that GC-MLP outperforms an MLP.

Comments:	Presented at the 17th Machine Learning in Computational Biology (MLCB) meeting, 2022
Subjects:	Machine Learning (cs.LG); Quantitative Methods (q-bio.QM)
Cite as:	arXiv:2211.06302 [cs.LG]
	(or arXiv:2211.06302v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2211.06302

Submission history

From: Andrei Margeloiu [view email]
[v1] Fri, 11 Nov 2022 16:13:34 UTC (275 KB)
[v2] Mon, 29 May 2023 06:28:17 UTC (305 KB)
[v3] Fri, 17 Nov 2023 15:14:41 UTC (345 KB)
[v4] Sat, 17 Aug 2024 20:16:45 UTC (439 KB)

Computer Science > Machine Learning

Title:Graph-Conditioned MLP for High-Dimensional Tabular Biomedical Data

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Graph-Conditioned MLP for High-Dimensional Tabular Biomedical Data

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators