Scheduling of Intermittent Query Processing

Chandrasekaran, Saranya; Sudarshan, S.

Computer Science > Databases

arXiv:2306.06678 (cs)

[Submitted on 11 Jun 2023 (v1), last revised 20 Sep 2024 (this version, v4)]

Title:Scheduling of Intermittent Query Processing

Authors:Saranya Chandrasekaran, S. Sudarshan

View PDF HTML (experimental)

Abstract:Stream processing is usually done either on a tuple-by-tuple basis or in micro-batches. There are many applications where tuples over a predefined duration/window must be processed within certain deadlines. Processing such queries using stream processing engines can be very inefficient since there is often a significant overhead per tuple or micro-batch. The cost of computation can be significantly reduced by using the wider window available for computation. In this work, we present scheduling schemes where the overhead cost is minimized while meeting the query deadline constraints. For such queries, since the result is needed only at the deadline, tuples can be processed in larger batches, instead of using micro-batches. We present scheduling schemes for single and multi query scenarios. The proposed scheduling algorithms have been implemented as a Custom Query Scheduler, on top of Apache Spark. Our performance study with TPC-H data, under single and multi query modes, shows orders of magnitude improvement as compared to naively using Spark streaming.

Subjects:	Databases (cs.DB)
Cite as:	arXiv:2306.06678 [cs.DB]
	(or arXiv:2306.06678v4 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2306.06678

Submission history

From: Saranya C [view email]
[v1] Sun, 11 Jun 2023 13:34:47 UTC (612 KB)
[v2] Sun, 21 Apr 2024 13:58:04 UTC (768 KB)
[v3] Thu, 5 Sep 2024 13:48:41 UTC (768 KB)
[v4] Fri, 20 Sep 2024 16:16:25 UTC (769 KB)

Computer Science > Databases

Title:Scheduling of Intermittent Query Processing

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Scheduling of Intermittent Query Processing

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators