Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

Takeda, Taikai; Hamada, Michiaki

doi:10.1093/bioinformatics/btx643

Quantitative Biology > Quantitative Methods

arXiv:1705.06911 (q-bio)

[Submitted on 19 May 2017 (v1), last revised 15 Oct 2017 (this version, v2)]

Title:Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

Authors:Taikai Takeda, Michiaki Hamada

View PDF

Abstract:Pair Hidden Markov Models (PHMMs) are probabilistic models used for pairwise sequence alignment, a quintessential problem in bioinformatics. PHMMs include three types of hidden states: match, insertion and deletion. Most previous studies have used one or two hidden states for each PHMM state type. However, few studies have examined the number of states suitable for representing sequence data or improving alignment this http URL developed a novel method to select superior models (including the number of hidden states) for PHMM. Our method selects models with the highest posterior probability using Factorized Information Criteria (FIC), which is widely utilised in model selection for probabilistic models with hidden variables. Our simulations indicated this method has excellent model selection capabilities with slightly improved alignment accuracy. We applied our method to DNA datasets from 5 and 28 species, ultimately selecting more complex models than those used in previous studies.

Comments:	This article has been accepted for publication in Bioinformatics Published by Oxford University Press
Subjects:	Quantitative Methods (q-bio.QM); Machine Learning (stat.ML)
Cite as:	arXiv:1705.06911 [q-bio.QM]
	(or arXiv:1705.06911v2 [q-bio.QM] for this version)
	https://doi.org/10.48550/arXiv.1705.06911
Journal reference:	Bioinformatics, 2017, btx643
Related DOI:	https://doi.org/10.1093/bioinformatics/btx643

Submission history

From: Taikai Takeda [view email]
[v1] Fri, 19 May 2017 09:49:59 UTC (206 KB)
[v2] Sun, 15 Oct 2017 06:52:17 UTC (270 KB)

✅2024-10-01: arxiv.org is back to normal.✅

Quantitative Biology > Quantitative Methods

Title:Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

✅2024-10-01: arxiv.org is back to normal.✅

Quantitative Biology > Quantitative Methods

Title:Beyond similarity assessment: Selecting the optimal model for sequence alignment via the Factorized Asymptotic Bayesian algorithm

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators