On-line Indexing for General Alphabets via Predecessor Queries on Subsets of an Ordered List

Kopelowitz, Tsvi

Abstract:The problem of Text Indexing is a fundamental algorithmic problem in which one wishes to preprocess a text in order to quickly locate pattern queries within the text. In the ever evolving world of dynamic and on-line data, there is also a need for developing solutions to index texts which arrive on-line, i.e. a character at a time, and still be able to quickly locate said patterns. In this paper, a new solution for on-line indexing is presented by providing an on-line suffix tree construction in $O(\log \log n + \log\log |\Sigma|)$ worst-case expected time per character, where $n$ is the size of the string, and $\Sigma$ is the alphabet. This improves upon all previously known on-line suffix tree constructions for general alphabets, at the cost of having the run time in expectation.
The main idea is to reduce the problem of constructing a suffix tree on-line to an interesting variant of the order maintenance problem, which may be of independent interest. In the famous order maintenance problem, one wishes to maintain a dynamic list $L$ of size $n$ under insertions, deletions, and order queries. In an order query, one is given two nodes from $L$ and must determine which node precedes the other in $L$. In the Predecessor search on Dynamic Subsets of an Ordered Dynamic List problem (POLP) it is also necessary to maintain dynamic subsets of $L$ such that given some $u\in L$ it will be possible to quickly locate the predecessor of $u$ in any subset. This paper provides an efficient data structure capable of solving the POLP with worst-case expected bounds that match the currently best known bounds for predecessor search in the RAM model, improving over a solution which may be implicitly obtained from Dietz [Die89].
Furthermore, this paper improves or simplifies bounds for several additional applications, including fully-persistent arrays and the Order-Maintenance Problem.

Comments:	Accepted to FOCS 2012, 17 pages
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1208.3798 [cs.DS]
	(or arXiv:1208.3798v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1208.3798

Computer Science > Data Structures and Algorithms

Title:On-line Indexing for General Alphabets via Predecessor Queries on Subsets of an Ordered List

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators