Approximately Minwise Independence with Twisted Tabulation

Dahlgaard, Søren; Thorup, Mikkel

Computer Science > Data Structures and Algorithms

arXiv:1404.6724 (cs)

[Submitted on 27 Apr 2014 (v1), last revised 1 May 2014 (this version, v2)]

Title:Approximately Minwise Independence with Twisted Tabulation

Authors:Søren Dahlgaard, Mikkel Thorup

View PDF

Abstract:A random hash function $h$ is $\varepsilon$-minwise if for any set $S$, $|S|=n$, and element $x\in S$, $\Pr[h(x)=\min h(S)]=(1\pm\varepsilon)/n$. Minwise hash functions with low bias $\varepsilon$ have widespread applications within similarity estimation.
Hashing from a universe $[u]$, the twisted tabulation hashing of Pǎtraşcu and Thorup [SODA'13] makes $c=O(1)$ lookups in tables of size $u^{1/c}$. Twisted tabulation was invented to get good concentration for hashing based sampling. Here we show that twisted tabulation yields $\tilde O(1/u^{1/c})$-minwise hashing.
In the classic independence paradigm of Wegman and Carter [FOCS'79] $\tilde O(1/u^{1/c})$-minwise hashing requires $\Omega(\log u)$-independence [Indyk SODA'99]. Pǎtraşcu and Thorup [STOC'11] had shown that simple tabulation, using same space and lookups yields $\tilde O(1/n^{1/c})$-minwise independence, which is good for large sets, but useless for small sets. Our analysis uses some of the same methods, but is much cleaner bypassing a complicated induction argument.

Comments:	To appear in Proceedings of SWAT 2014
Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1404.6724 [cs.DS]
	(or arXiv:1404.6724v2 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1404.6724

Submission history

From: Søren Dahlgaard [view email]
[v1] Sun, 27 Apr 2014 07:59:38 UTC (13 KB)
[v2] Thu, 1 May 2014 09:12:24 UTC (13 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.DS

< prev | next >

new | recent | 2014-04

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Søren Dahlgaard
Mikkel Thorup

export BibTeX citation

Computer Science > Data Structures and Algorithms

Title:Approximately Minwise Independence with Twisted Tabulation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Approximately Minwise Independence with Twisted Tabulation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators