Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

Liu, Feilong; Blanas, Spyros

doi:10.1145/2806777.2806944

Computer Science > Databases

arXiv:1507.03049 (cs)

[Submitted on 11 Jul 2015 (v1), last revised 21 Jul 2015 (this version, v2)]

Title:Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

Authors:Feilong Liu, Spyros Blanas (The Ohio State University)

View PDF

Abstract:Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.

Comments:	15 pages, 8 figures, extended version of the paper to appear in SoCC'15
Subjects:	Databases (cs.DB)
Cite as:	arXiv:1507.03049 [cs.DB]
	(or arXiv:1507.03049v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1507.03049
Related DOI:	https://doi.org/10.1145/2806777.2806944

Submission history

From: Feilong Liu [view email]
[v1] Sat, 11 Jul 2015 00:17:59 UTC (391 KB)
[v2] Tue, 21 Jul 2015 19:57:52 UTC (392 KB)

Computer Science > Databases

Title:Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators