Making massive probabilistic databases practical

Todor, Andrei; Dobra, Alin; Kahveci, Tamer; Dudley, Christopher

Computer Science > Databases

arXiv:1307.0844 (cs)

[Submitted on 2 Jul 2013]

Title:Making massive probabilistic databases practical

Authors:Andrei Todor, Alin Dobra, Tamer Kahveci, Christopher Dudley

View PDF

Abstract:Existence of incomplete and imprecise data has moved the database paradigm from deterministic to proba- babilistic information. Probabilistic databases contain tuples that may or may not exist with some probability. As a result, the number of possible deterministic database instances that can be observed from a probabilistic database grows exponentially with the number of probabilistic tuples. In this paper, we consider the problem of answering both aggregate and non-aggregate queries on massive probabilistic databases. We adopt the tuple independence model, in which each tuple is assigned a probability value. We develop a method that exploits Probability Generating Functions (PGF) to answer such queries efficiently. Our method maintains a polynomial for each tuple. It incrementally builds a master polynomial that expresses the distribution of the possible result values precisely. We also develop an approximation method that finds the distribution of the result value with negligible errors. Our experiments suggest that our methods are orders of magnitude faster than the most recent systems that answer such queries, including MayBMS and SPROUT. In our experiments, we were able to scale up to several terabytes of data on TPC- H queries, while existing methods could only run for a few gigabytes of data on the same queries.

Subjects:	Databases (cs.DB)
Cite as:	arXiv:1307.0844 [cs.DB]
	(or arXiv:1307.0844v1 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1307.0844

Submission history

From: Andrei Todor [view email]
[v1] Tue, 2 Jul 2013 20:50:15 UTC (100 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DB

< prev | next >

new | recent | 2013-07

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Andrei Todor
Alin Dobra
Tamer Kahveci
Christopher Dudley

export BibTeX citation

Computer Science > Databases

Title:Making massive probabilistic databases practical

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Making massive probabilistic databases practical

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators