The Adaptive Sampling Revisited

Drescher, Matthew; Louchard, Guy; Swan, Yvik

Computer Science > Data Structures and Algorithms

arXiv:1805.08043 (cs)

[Submitted on 21 May 2018 (v1), last revised 16 May 2019 (this version, v3)]

Title:The Adaptive Sampling Revisited

Authors:Matthew Drescher, Guy Louchard, Yvik Swan

View PDF

Abstract:The problem of estimating the number $n$ of distinct keys of a large collection of $N$ data is well known in computer science. A classical algorithm is the adaptive sampling (AS). $n$ can be estimated by $R.2^D$, where $R$ is the final bucket (cache) size and $D$ is the final depth at the end of the process. Several new interesting questions can be asked about AS (some of them were suggested by this http URL and popularized by this http URL). The distribution of $W=\log (R2^D/n)$ is known, we rederive this distribution in a simpler way. We provide new results on the moments of $D$ and $W$. We also analyze the final cache size $R$ distribution. We consider colored keys: assume that among the $n$ distinct keys, $n_C$ do have color $C$. We show how to estimate $p=\frac{n_C}{n}$. We also study colored keys with some multiplicity given by some distribution function. We want to estimate mean an variance of this distribution. Finally, we consider the case where neither colors nor multiplicities are known. There we want to estimate the related parameters. An appendix is devoted to the case where the hashing function provides bits with probability different from $1/2$.

Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1805.08043 [cs.DS]
	(or arXiv:1805.08043v3 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1805.08043

Submission history

From: Guy Louchard [view email]
[v1] Mon, 21 May 2018 13:36:30 UTC (20 KB)
[v2] Wed, 15 May 2019 14:11:12 UTC (29 KB)
[v3] Thu, 16 May 2019 07:20:41 UTC (29 KB)

Computer Science > Data Structures and Algorithms

Title:The Adaptive Sampling Revisited

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:The Adaptive Sampling Revisited

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators