Streaming dictionary matching with mismatches

Gawrychowski, Paweł; Starikovskaya, Tatiana

Computer Science > Data Structures and Algorithms

arXiv:1809.02517 (cs)

[Submitted on 7 Sep 2018 (v1), last revised 20 Jun 2021 (this version, v3)]

Title:Streaming dictionary matching with mismatches

Authors:Paweł Gawrychowski, Tatiana Starikovskaya

View PDF

Abstract:In the $k$-mismatch problem we are given a pattern of length $n$ and a text and must find all locations where the Hamming distance between the pattern and the text is at most $k$. A series of recent breakthroughs have resulted in an ultra-efficient streaming algorithm for this problem that requires only $O(k \log \frac{n}{k})$ space and $O(\log \frac{n}{k} (\sqrt{k \log k} + \log^3 n))$ time per letter [Clifford, Kociumaka, Porat, SODA 2019]. In this work, we consider a strictly harder problem called dictionary matching with $k$ mismatches. In this problem, we are given a dictionary of $d$ patterns, where the length of each pattern is at most $n$, and must find all substrings of the text that are within Hamming distance $k$ from one of the patterns. We develop a streaming algorithm for this problem with $O(k d \log^k d \mathrm{polylog}(n))$ space and $O(k \log^{k} d \mathrm{polylog}(n) + |\mathrm{occ}|)$ time per position of the text. The algorithm is randomised and outputs correct answers with high probability. On the lower bound side, we show that any streaming algorithm for dictionary matching with $k$ mismatches requires $\Omega(k d)$ bits of space.

Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1809.02517 [cs.DS]
	(or arXiv:1809.02517v3 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1809.02517

Submission history

From: Tatiana Starikovskaya [view email]
[v1] Fri, 7 Sep 2018 14:54:53 UTC (20 KB)
[v2] Tue, 22 Jan 2019 12:10:23 UTC (17 KB)
[v3] Sun, 20 Jun 2021 15:16:20 UTC (26 KB)

Computer Science > Data Structures and Algorithms

Title:Streaming dictionary matching with mismatches

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Streaming dictionary matching with mismatches

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators