COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

Ren, Yuchen; Han, Wenwei; Zhang, Qianyuan; Tang, Yining; Bai, Weiqiang; Cai, Yuchen; Qiao, Lifeng; Jiang, Hao; Yuan, Dong; Chen, Tao; Sun, Siqi; Tan, Pan; Ouyang, Wanli; Dong, Nanqing; Ma, Xinzhu; Ye, Peng

Quantitative Biology > Biomolecules

arXiv:2412.10347 (q-bio)

[Submitted on 13 Dec 2024]

Title:COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

Authors:Yuchen Ren, Wenwei Han, Qianyuan Zhang, Yining Tang, Weiqiang Bai, Yuchen Cai, Lifeng Qiao, Hao Jiang, Dong Yuan, Tao Chen, Siqi Sun, Pan Tan, Wanli Ouyang, Nanqing Dong, Xinzhu Ma, Peng Ye

View PDF HTML (experimental)

Abstract:As key elements within the central dogma, DNA, RNA, and proteins play crucial roles in maintaining life by guaranteeing accurate genetic expression and implementation. Although research on these molecules has profoundly impacted fields like medicine, agriculture, and industry, the diversity of machine learning approaches-from traditional statistical methods to deep learning models and large language models-poses challenges for researchers in choosing the most suitable models for specific tasks, especially for cross-omics and multi-omics tasks due to the lack of comprehensive benchmarks. To address this, we introduce the first comprehensive multi-omics benchmark COMET (Benchmark for Biological COmprehensive Multi-omics Evaluation Tasks and Language Models), designed to evaluate models across single-omics, cross-omics, and multi-omics tasks. First, we curate and develop a diverse collection of downstream tasks and datasets covering key structural and functional aspects in DNA, RNA, and proteins, including tasks that span multiple omics levels. Then, we evaluate existing foundational language models for DNA, RNA, and proteins, as well as the newly proposed multi-omics method, offering valuable insights into their performance in integrating and analyzing data from different biological modalities. This benchmark aims to define critical issues in multi-omics research and guide future directions, ultimately promoting advancements in understanding biological processes through integrated and different omics data analysis.

Subjects:	Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2412.10347 [q-bio.BM]
	(or arXiv:2412.10347v1 [q-bio.BM] for this version)
	https://doi.org/10.48550/arXiv.2412.10347

Submission history

From: Yuchen Ren [view email]
[v1] Fri, 13 Dec 2024 18:42:00 UTC (11,946 KB)

Quantitative Biology > Biomolecules

Title:COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Quantitative Biology > Biomolecules

Title:COMET: Benchmark for Comprehensive Biological Multi-omics Evaluation Tasks and Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators