Visualization-Aware Sampling for Very Large Databases

Park, Yongjoo; Cafarella, Michael; Mozafari, Barzan

Computer Science > Databases

arXiv:1510.03921 (cs)

[Submitted on 13 Oct 2015 (v1), last revised 23 Jan 2017 (this version, v2)]

Title:Visualization-Aware Sampling for Very Large Databases

Authors:Yongjoo Park, Michael Cafarella, Barzan Mozafari

View PDF

Abstract:Interactive visualizations are crucial in ad hoc data exploration and analysis. However, with the growing number of massive datasets, generating visualizations in interactive timescales is increasingly challenging. One approach for improving the speed of the visualization tool is via data reduction in order to reduce the computational overhead, but at a potential cost in visualization accuracy. Common data reduction techniques, such as uniform and stratified sampling, do not exploit the fact that the sampled tuples will be transformed into a visualization for human consumption.
We propose a visualization-aware sampling (VAS) that guarantees high quality visualizations with a small subset of the entire dataset. We validate our method when applied to scatter and map plots for three common visualization goals: regression, density estimation, and clustering. The key to our sampling method's success is in choosing tuples which minimize a visualization-inspired loss function. Our user study confirms that optimizing this loss function correlates strongly with user success in using the resulting visualizations. We also show the NP-hardness of our optimization problem and propose an efficient approximation algorithm. Our experiments show that, compared to previous methods, (i) using the same sample size, VAS improves user's success by up to 35% in various visualization tasks, and (ii) VAS can achieve a required visualization quality up to 400 times faster.

Subjects:	Databases (cs.DB)
Cite as:	arXiv:1510.03921 [cs.DB]
	(or arXiv:1510.03921v2 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.1510.03921
Journal reference:	Data Engineering (ICDE), 2016 IEEE 32nd International Conference on. IEEE, 2016

Submission history

From: Yongjoo Park [view email]
[v1] Tue, 13 Oct 2015 22:51:36 UTC (2,580 KB)
[v2] Mon, 23 Jan 2017 23:47:49 UTC (3,290 KB)

Computer Science > Databases

Title:Visualization-Aware Sampling for Very Large Databases

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:Visualization-Aware Sampling for Very Large Databases

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators