COAX: Correlation-Aware Indexing on Multidimensional Data with Soft Functional Dependencies

Hadian, Ali; Ghaffari, Behzad; Wang, Taiyi; Heinis, Thomas

Computer Science > Databases

arXiv:2006.16393v3 (cs)

[Submitted on 29 Jun 2020 (v1), last revised 2 Feb 2021 (this version, v3)]

Title:COAX: Correlation-Aware Indexing on Multidimensional Data with Soft Functional Dependencies

Authors:Ali Hadian, Behzad Ghaffari, Taiyi Wang, Thomas Heinis

View PDF

Abstract:Recent work proposed learned index structures, which learn the distribution of the underlying dataset to improve performance. The initial work on learned indexes has shown that by learning the cumulative distribution function of the data, index structures such as the B-Tree can improve their performance by one order of magnitude while having a smaller memory footprint.
In this paper, we present COAX, a learned index for multidimensional data that, instead of learning the distribution of keys, learns the correlations between attributes of the dataset. Our approach is driven by the observation that in many datasets, values of two (or multiple) attributes are correlated. COAX exploits these correlations to reduce the dimensionality of the datasets.
More precisely, we learn how to infer one (or multiple) attribute $C_d$ from the remaining attributes and hence no longer need to index attribute $C_d$. This reduces the dimensionality and hence makes the index smaller and more efficient.
We theoretically investigate the effectiveness of the proposed technique based on the predictability of the FD attributes. We further show experimentally that by predicting correlated attributes in the data, we can improve the query execution time and reduce the memory overhead of the index. In our experiments, we reduce the execution time by 25% while reducing the memory footprint of the index by four orders of magnitude.

Subjects:	Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Cite as:	arXiv:2006.16393 [cs.DB]
	(or arXiv:2006.16393v3 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2006.16393

Submission history

From: Ali Hadian [view email]
[v1] Mon, 29 Jun 2020 21:22:15 UTC (1,278 KB)
[v2] Fri, 15 Jan 2021 20:47:00 UTC (2,822 KB)
[v3] Tue, 2 Feb 2021 15:43:57 UTC (2,822 KB)

Computer Science > Databases

Title:COAX: Correlation-Aware Indexing on Multidimensional Data with Soft Functional Dependencies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:COAX: Correlation-Aware Indexing on Multidimensional Data with Soft Functional Dependencies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators