Efficient Computation of Sequence Mappability

Charalampopoulos, Panagiotis; Iliopoulos, Costas S.; Kociumaka, Tomasz; Pissis, Solon P.; Radoszewski, Jakub; Straszyński, Juliusz

Computer Science > Data Structures and Algorithms

arXiv:1807.11702 (cs)

[Submitted on 31 Jul 2018 (v1), last revised 16 Jun 2021 (this version, v3)]

Title:Efficient Computation of Sequence Mappability

Authors:Panagiotis Charalampopoulos, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Juliusz Straszyński

View PDF

Abstract:In the $(k,m)$-mappability problem, for a given sequence $T$ of length $n$, the goal is to compute a table whose $i$th entry is the number of indices $j \ne i$ such that the length-$m$ substrings of $T$ starting at positions $i$ and $j$ have at most $k$ mismatches. Previous works on this problem focused on heuristics computing a rough approximation of the result or on the case of $k=1$. We present several efficient algorithms for the general case of the problem. Our main result is an algorithm that, for $k=\mathcal{O}(1)$, works in $\mathcal{O}(n)$ space and, with high probability, in $\mathcal{O}(n \cdot \min\{m^k,\log^k n\})$ time. Our algorithm requires a careful adaptation of the $k$-errata trees of Cole et al. [STOC 2004] to avoid multiple counting of pairs of substrings. Our technique can also be applied to solve the all-pairs Hamming distance problem introduced by Crochemore et al. [WABI 2017]. We further develop $\mathcal{O}(n^2)$-time algorithms to compute all $(k,m)$-mappability tables for a fixed $m$ and all $k\in \{0,\ldots,m\}$ or a fixed $k$ and all $m\in\{k,\ldots,n\}$. Finally, we show that, for $k,m = \Theta(\log n)$, the $(k,m)$-mappability problem cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis fails.
This is an improved and extended version of a paper that was presented at SPIRE 2018.

Comments:	Accepted to SPIRE 2018
Subjects:	Data Structures and Algorithms (cs.DS)
ACM classes:	F.2.2
Cite as:	arXiv:1807.11702 [cs.DS]
	(or arXiv:1807.11702v3 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1807.11702

Submission history

From: Juliusz Straszyński [view email]
[v1] Tue, 31 Jul 2018 08:49:11 UTC (20 KB)
[v2] Mon, 14 Jun 2021 15:40:47 UTC (266 KB)
[v3] Wed, 16 Jun 2021 20:03:10 UTC (266 KB)

Computer Science > Data Structures and Algorithms

Title:Efficient Computation of Sequence Mappability

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Efficient Computation of Sequence Mappability

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators