Computer Science > Databases
[Submitted on 8 Feb 2022]
Title:OSM-tree: A Sortedness-Aware Index
View PDFAbstract:Indexes facilitate efficient querying when the selection predicate is on an indexed key. As a result, when loading data, if we anticipate future selective (point or range) queries, we typically maintain an index that is gradually populated as new data is ingested. In that respect, indexing can be perceived as the process of adding structure to an incoming, otherwise unsorted, data collection. The process of adding structure comes at a cost, as instead of simply appending incoming data, every new entry is inserted into the index. If the data ingestion order matches the indexed attribute order, the ingestion cost is entirely redundant and can be avoided (e.g., via bulk loading in a B+-tree). However, state-of-the-art index designs do not benefit when data is ingested in an order that is close to being sorted but not fully sorted. In this paper, we study how indexes can benefit from partial data sortedness or near-sortedness, and we propose an ensemble of techniques that combine bulk loading, index appends, variable node fill/split factor, and buffering, to optimize the ingestion cost of a tree index in presence of partial data sortedness. We further augment the proposed design with necessary metadata structures to ensure competitive read performance. We apply the proposed design paradigm on a state-of-the-art B+-tree, and we propose the Ordered Sort-Merge tree (OSM-tree). OSM-tree outperforms the state of the art by up to 8.8x in ingestion performance in the presence of sortedness, while falling back to a B+-tree's ingestion performance when data is scrambled. OSM-tree offers competitive query performance, leading to performance benefits between 28% and 5x for mixed read/write workloads.
References & Citations
Bibliographic and Citation Tools
Bibliographic Explorer (What is the Explorer?)
Connected Papers (What is Connected Papers?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Code, Data and Media Associated with this Article
alphaXiv (What is alphaXiv?)
CatalyzeX Code Finder for Papers (What is CatalyzeX?)
DagsHub (What is DagsHub?)
Gotit.pub (What is GotitPub?)
Hugging Face (What is Huggingface?)
Papers with Code (What is Papers with Code?)
ScienceCast (What is ScienceCast?)
Demos
Recommenders and Search Tools
Influence Flower (What are Influence Flowers?)
CORE Recommender (What is CORE?)
arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.