In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond

Boldi, Paolo; Vigna, Sebastiano

Computer Science > Data Structures and Algorithms

arXiv:1308.2144 (cs)

[Submitted on 9 Aug 2013 (v1), last revised 12 Aug 2013 (this version, v2)]

Title:In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond

Authors:Paolo Boldi, Sebastiano Vigna

View PDF

Abstract:Given a social network, which of its nodes are more central? This question has been asked many times in sociology, psychology and computer science, and a whole plethora of centrality measures (a.k.a. centrality indices, or rankings) were proposed to account for the importance of the nodes of a network. In this paper, we approach the problem of computing geometric centralities, such as closeness and harmonic centrality, on very large graphs; traditionally this task requires an all-pairs shortest-path computation in the exact case, or a number of breadth-first traversals for approximated computations, but these techniques yield very weak statistical guarantees on highly disconnected graphs. We rather assume that the graph is accessed in a semi-streaming fashion, that is, that adjacency lists are scanned almost sequentially, and that a very small amount of memory (in the order of a dozen bytes) per node is available in core memory. We leverage the newly discovered algorithms based on HyperLogLog counters, making it possible to approximate a number of geometric centralities at a very high speed and with high accuracy. While the application of similar algorithms for the approximation of closeness was attempted in the MapReduce framework, our exploitation of HyperLogLog counters reduces exponentially the memory footprint, paving the way for in-core processing of networks with a hundred billion nodes using "just" 2TiB of RAM. Moreover, the computations we describe are inherently parallelizable, and scale linearly with the number of available cores.

Subjects:	Data Structures and Algorithms (cs.DS); Social and Information Networks (cs.SI); Physics and Society (physics.soc-ph)
Cite as:	arXiv:1308.2144 [cs.DS]
	(or arXiv:1308.2144v2 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1308.2144

Submission history

From: Sebastiano Vigna [view email]
[v1] Fri, 9 Aug 2013 14:56:55 UTC (82 KB)
[v2] Mon, 12 Aug 2013 10:25:46 UTC (82 KB)

Computer Science > Data Structures and Algorithms

Title:In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators