Data Structures and Algorithms

New submissions
Cross-lists
Replacements

See recent articles

Total of 38 entries

Showing up to 2000 entries per page: fewer | more | all

[1] arXiv:2407.00182 [pdf, html, other]: Title: Fast Computation of the Discrete Fourier Transform Square Index Coefficients

Saulo Queiroz, João P. Vilela, Edmundo Monteiro

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Signal Processing (eess.SP)

The $N$-point discrete Fourier transform (DFT) is a cornerstone for several signal processing applications. Many of these applications operate in real-time, making the computational complexity of the DFT a critical performance indicator to be optimized. Unfortunately, whether the $\mathcal{O}(N\log_2 N)$ time complexity of the fast Fourier transform (FFT) can be outperformed remains an unresolved question in the theory of computation. However, in many applications of the DFT -- such as compressive sensing, image processing, and wideband spectral analysis -- only a small fraction of the output signal needs to be computed because the signal is sparse. This motivates the development of algorithms that compute specific DFT coefficients more efficiently than the FFT algorithm. In this article, we show that the number of points of some DFT coefficients can be dramatically reduced by means of elementary mathematical properties. We present an algorithm that compacts the square index coefficients (SICs) of DFT (i.e., $X_{k\sqrt{N}}$, $k=0,1,\cdots, \sqrt{N}-1$, for a square number $N$) from $N$ to $\sqrt{N}$ points at the expense of $N-1$ complex sums and no multiplication. Based on this, any regular DFT algorithm can be straightforwardly applied to compute the SICs with a reduced number of complex multiplications. If $N$ is a power of two, one can combine our algorithm with the FFT to calculate all SICs in $\mathcal{O}(\sqrt{N}\log_2\sqrt{N})$ time complexity.
[2] arXiv:2407.00237 [pdf, html, other]: Title: The Even-Path Problem in Directed Single-Crossing-Minor-Free Graphs

Archit Chauhan, Samir Datta, Chetan Gupta, Vimal Raj Sharma

Subjects: Data Structures and Algorithms (cs.DS)

Finding a simple path of even length between two designated vertices in a directed graph is a fundamental NP-complete problem known as the EvenPath problem. Nedev proved in 1999, that for directed planar graphs, the problem can be solved in polynomial time. More than two decades since then, we make the first progress in extending the tractable classes of graphs for this problem. We give a polynomial time algorithm to solve the EvenPath problem for classes of H-minor-free directed graphs,1 where H is a single-crossing graph. We make two new technical contributions along the way, that might be of independent interest. The first, and perhaps our main, contribution is the construction of small, planar, parity-mimicking networks. These are graphs that mimic parities of all possible paths between a designated set of terminals of the original graph. Finding vertex disjoint paths between given source-destination pairs of vertices is another fundamental problem, known to be NP-complete in directed graphs, though known to be tractable in planar directed graphs. We encounter a natural variant of this problem, that of finding disjoint paths between given pairs of vertices, but with constraints on parity of the total length of paths. The other significant contribution of our paper is to give a polynomial time algorithm for the 3-disjoint paths with total parity problem, in directed planar graphs with some restrictions (and also in directed graphs of bounded treewidth).
[3] arXiv:2407.00511 [pdf, html, other]: Title: Wooly Graphs : A Mathematical Framework For Knitting

Kathryn Gray, Brian Bell, Diana Sieper, Stephen Kobourov, Falk Schreiber, Karsten Klein, Seokhee Hong

Comments: 11 pages, 4 tables, 5 figures

Subjects: Data Structures and Algorithms (cs.DS)

This paper aims to develop a mathematical foundation to model knitting with graphs. We provide a precise definition for knit objects with a knot theoretic component and propose a simple undirected graph, a simple directed graph, and a directed multigraph model for any arbitrary knit object. Using these models, we propose natural categories related to the complexity of knitting structures. We use these categories to explore the hardness of determining whether a knit object of each class exists for a given graph. We show that while this problem is NP-hard in general, under specific cases, there are linear and polynomial time algorithms which take advantage of unique properties of common knitting techniques. This work aims to bridge the gap between textile arts and graph theory, offering a useful and rigorous framework for analyzing knitting objects using their corresponding graphs and for generating knitting objects from graphs.
[4] arXiv:2407.00573 [pdf, html, other]: Title: A Simple Representation of Tree Covering Utilizing Balanced Parentheses and Efficient Implementation of Average-Case Optimal RMQs

Kou Hamada, Sankardeep Chakraborty, Seungbum Jo, Takuto Koriyama, Kunihiko Sadakane, Srinivasa Rao Satti

Comments: To appear in ESA 2024

Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)

Tree covering is a technique for decomposing a tree into smaller-sized trees with desirable properties, and has been employed in various succinct data structures. However, significant hurdles stand in the way of a practical implementation of tree covering: a lot of pointers are used to maintain the tree-covering hierarchy and many indices for tree navigational queries consume theoretically negligible yet practically vast space. To tackle these problems, we propose a simple representation of tree covering using a balanced parenthesis representation. The key to the proposal is the observation that every micro tree splits into at most two intervals on the BP representation. Utilizing the representation, we propose several data structures that represent a tree and its tree cover, which consequently allow micro tree compression with arbitrary coding and efficient tree navigational queries. We also applied our data structure to average-case optimal RMQ by Munro et al.~[ESA 2021] and implemented the RMQ data structure. Our RMQ data structures spend less than $2n$ bits and process queries in a practical time on several settings of the performance evaluation, reducing the gap between theoretical space complexity and actual space consumption. We also implement tree navigational operations while using the same amount of space as the RMQ data structures. We believe the representation can be widely utilized for designing practically memory-efficient data structures based on tree covering.
[5] arXiv:2407.00586 [pdf, html, other]: Title: A Parameterized Algorithm for Vertex and Edge Connectivity of Embedded Graphs

Therese Biedl, Prosenjit Bose, Karthik Murali

Subjects: Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)

The problem of computing vertex and edge connectivity of a graph are classical problems in algorithmic graph theory. The focus of this paper is on computing these parameters on embedded graphs. A typical example of an embedded graph is a planar graph which can be drawn with no edge crossings. It has long been known that vertex and edge connectivity of planar embedded graphs can be computed in linear time. Very recently, Biedl and Murali extended the techniques from planar graphs to 1-plane graphs without $\times$-crossings, i.e., crossings whose endpoints induce a matching. While the tools used were novel, they were highly tailored to 1-plane graphs, and do not provide much leeway for further extension. In this paper, we develop alternate techniques that are simpler, have wider applications to near-planar graphs, and can be used to test both vertex and edge connectivity. Our technique works for all those embedded graphs where any pair of crossing edges are connected by a path that, roughly speaking, can be covered with few cells of the drawing. Important examples of such graphs include optimal 2-planar and optimal 3-planar graphs, $d$-map graphs, $d$-framed graphs, graphs with bounded crossing number, and $k$-plane graphs with bounded number of $\times$-crossings.
[6] arXiv:2407.00734 [pdf, html, other]: Title: Balanced Learned Sort: a new learned model for fast and balanced item bucketing

Paolo Ferragina, Mattia Odorisio

Subjects: Data Structures and Algorithms (cs.DS)

This paper aims to better understand the strengths and limitations of adopting learned-based approaches in sequential sorting numerical data, via two main research steps.
First, we study different learned models for distribution-based sorting, starting from some known ones (i.e., two-layer RMI or simple linear models) and then introducing some novel models that either improve the two-layer RMI or are fully new in their algorithmic structure thus resulting space efficient, monotonic, and very fast in building balanced buckets. We test those models over 11 synthetic datasets drawn from different distributions of 200M 64-bit floating-point items, so deriving hints about their ultimate performance and usefulness in designing a sorting algorithm.
Based on these findings, we select and plug the best models from above in a new learned-based algorithmic scheme and devise three new sorters that we will test against other 6 sequential sorters (5 classic and 1 learned, known and new ones) over 33 datasets (11 synthetic and 22 real), whose size will be up to 800M items. Our experimental figures will show that our learned sorters achieve superior performance on 31 out of all 33 datasets (synthetic and real). In conclusion, these experimental results provide, on the one hand, a comprehensive answer to the main question: Which algorithmic structure for distribution-based sorting is suited to leverage a learned model in order to achieve efficient performance? and, on the other hand, they leave open several other research and engineering questions about the design of a highly performing sequential sorter that is robust over different input distributions.
[7] arXiv:2407.01052 [pdf, html, other]: Title: Efficient algorithms for computing bisimulations for nondeterministic fuzzy transition systems

Linh Anh Nguyen

Subjects: Data Structures and Algorithms (cs.DS)

Fuzzy transition systems offer a robust framework for modeling and analyzing systems with inherent uncertainties and imprecision, which are prevalent in real-world scenarios. As their extension, nondeterministic fuzzy transition systems (NFTSs) have been studied in a considerable number of works. Wu et al. (2018) provided an algorithm for computing the greatest crisp bisimulation of a finite NFTS $\mathcal{S} = \langle S, A, \delta \rangle$, with a time complexity of order $O(|S|^4 \cdot |\delta|^2)$ under the assumption that $|\delta| \geq |S|$. Qiao {\em et al.} (2023) provided an algorithm for computing the greatest fuzzy bisimulation of a finite NFTS $\mathcal{S}$ under the Gödel semantics, with a time complexity of order $O(|S|^4 \cdot |\delta|^2 \cdot l)$ under the assumption that $|\delta| \geq |S|$, where $l$ is the number of fuzzy values used in $\mathcal{S}$ plus 1. In this work, we provide efficient algorithms for computing the partition corresponding to the greatest crisp bisimulation of a finite NFTS $\mathcal{S}$, as well as the compact fuzzy partition corresponding to the greatest fuzzy bisimulation of $\mathcal{S}$ under the Gödel semantics. Their time complexities are of the order $O((size(\delta) \log{l} + |S|) \log{(|S| + |\delta|)})$, where $l$ is the number of fuzzy values used in $\mathcal{S}$ plus 2. When $|\delta| \geq |S|$, this order is within $O(|S| \cdot |\delta| \cdot \log^2{|\delta|})$. The reduction of time complexity from $O(|S|^4 \cdot |\delta|^2)$ and $O(|S|^4 \cdot |\delta|^2 \cdot l)$ to $O(|S| \cdot |\delta| \cdot \log^2{|\delta|})$ is a significant contribution of this work. In addition, we introduce nondeterministic fuzzy labeled transition systems, which extend NFTSs with fuzzy state labels, and we define and provide results on simulations and bisimulations between them.
[8] arXiv:2407.01071 [pdf, html, other]: Title: Linear-Time MaxCut in Multigraphs Parameterized Above the Poljak-Turz\'ik Bound

Jonas Lill, Kalina Petrova, Simon Weber

Comments: 20 pages

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM)

MaxCut is a classical NP-complete problem and a crucial building block in many combinatorial algorithms. The famous Edwards-Erdős bound states that any connected graph on n vertices with m edges contains a cut of size at least $m/2 + (n-1)/4$. Crowston, Jones and Mnich [Algorithmica, 2015] showed that the MaxCut problem on simple connected graphs admits an FPT algorithm, where the parameter k is the difference between the desired cut size c and the lower bound given by the Edwards-Erdős bound. This was later improved by Etscheid and Mnich [Algorithmica, 2017] to run in parameterized linear time, i.e., $f(k)\cdot O(m)$. We improve upon this result in two ways: Firstly, we extend the algorithm to work also for multigraphs (alternatively, graphs with positive integer weights). Secondly, we change the parameter; instead of the difference to the Edwards-Erdős bound, we use the difference to the Poljak-Turzík bound. The Poljak-Turzík bound states that any weighted graph G has a cut of size at least $w(G)/2 + w_{MSF}(G)/4$, where w(G) denotes the total weight of G, and $w_{MSF}(G)$ denotes the weight of its minimum spanning forest. In connected simple graphs the two bounds are equivalent, but for multigraphs the Poljak-Turzík bound can be larger and thus yield a smaller parameter k. Our algorithm also runs in parameterized linear time, i.e., $f(k)\cdot O(m+n)$.
[9] arXiv:2407.01431 [pdf, html, other]: Title: Graph Spanners for Group Steiner Distances

Davide Bilò, Luciano Gualà, Stefano Leucci, Alessandro Straziota

Subjects: Data Structures and Algorithms (cs.DS)

A spanner is a sparse subgraph of a given graph $G$ which preserves distances, measured w.r.t.\ some distance metric, up to a multiplicative stretch factor. This paper addresses the problem of constructing graph spanners w.r.t.\ the group Steiner metric, which generalizes the recently introduced beer distance metric. In such a metric we are given a collection of groups of required vertices, and we measure the distance between two vertices as the length of the shortest path between them that traverses at least one required vertex from each group.
We discuss the relation between group Steiner spanners and classic spanners and we show that they exhibit strong ties with sourcewise spanners w.r.t.\ the shortest path metric. Nevertheless, group Steiner spanners capture several interesting scenarios that are not encompassed by existing spanners. This happens, e.g., for the singleton case, in which each group consists of a single required vertex, thus modeling the setting in which routes need to traverse certain points of interests (in any order).
We provide several constructions of group Steiner spanners for both the all-pairs and single-source case, which exhibit various size-stretch trade-offs. Notably, we provide spanners with almost-optimal trade-offs for the singleton case. Moreover, some of our spanners also yield novel trade-offs for classical sourcewise spanners.
Finally, we also investigate the query times that can be achieved when our spanners are turned into group Steiner distance oracles with the same size, stretch, and building time.
[10] arXiv:2407.01465 [pdf, html, other]: Title: Kick the cliques

Gaétan Berthe, Marin Bougeret, Daniel Gonçalves, Jean-Florent Raymond

Subjects: Data Structures and Algorithms (cs.DS)

In the $K_r$-Cover problem, given a graph $G$ and an integer $k$ one has to decide if there exists a set of at most $k$ vertices whose removal destroys all $r$-cliques of $G$. In this paper we give an algorithm for $K_r$-Cover that runs in subexponential FPT time on graph classes satisfying two simple conditions related to cliques and treewidth. As an application we show that our algorithm solves $K_r$-Cover in time
* $2^{O_r\left (k^{(r+1)/(r+2)}\log k \right)} \cdot n^{O_r(1)}$ in pseudo-disk graphs and map-graphs;
* $2^{O_{t,r}(k^{2/3}\log k)} \cdot n^{O_r(1)}$ in $K_{t,t}$-subgraph-free string graphs; and
* $2^{O_{H,r}(k^{2/3}\log k)} \cdot n^{O_r(1)}$ in $H$-minor-free graphs.

[11] arXiv:2407.00041 (cross-list from hep-lat) [pdf, other]: Title: Accelerating Lattice QCD Simulations using GPUs

Tilmann Matthaei

Comments: source code available: this https URL

Subjects: High Energy Physics - Lattice (hep-lat); Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

Solving discretized versions of the Dirac equation represents a large share of execution time in lattice Quantum Chromodynamics (QCD) simulations. Many high-performance computing (HPC) clusters use graphics processing units (GPUs) to offer more computational resources. Our solver program, DDalphaAMG, previously was unable to fully take advantage of GPUs to accelerate its computations. Making use of GPUs for DDalphaAMG is an ongoing development, and we will present some current progress herein. Through a detailed description of our development, this thesis should offer valuable insights into using GPUs to accelerate a memory-bound CPU implementation.
We developed a storage scheme for multiple tuples, which allows much more efficient memory access on GPUs, given that the element at the same index is read from multiple tuples simultaneously. Still, our implementation of a discrete Dirac operator is memory-bound, and we only achieved improvements for large linear systems on few nodes at the JUWELS cluster. These improvements do not currently overcome additional introduced overheads. However, the results for the application of the Wilson-Dirac operator show a speedup of around 3 for large lattices. If the additional overheads can be eliminated in the future, GPUs could reduce the DDalphaAMG execution time significantly for large lattices.
We also found that a previous publication on the GPU acceleration of DDalphaAMG, underrepresented the achieved speedup, because small lattices were used. This further highlights that GPUs often require large-scale problems to solve in order to be faster than CPUs
[12] arXiv:2407.00251 (cross-list from cs.RO) [pdf, html, other]: Title: Leveraging Fixed-Parameter Tractability for Robot Inspection Planning

Yosuke Mizutani, Daniel Coimbra Salomao, Alex Crane, Matthias Bentert, Pål Grønås Drange, Felix Reidl, Alan Kuntz, Blair D. Sullivan

Subjects: Robotics (cs.RO); Data Structures and Algorithms (cs.DS)

Autonomous robotic inspection, where a robot moves through its environment and inspects points of interest, has applications in industrial settings, structural health monitoring, and medicine. Planning the paths for a robot to safely and efficiently perform such an inspection is an extremely difficult algorithmic challenge. In this work we consider an abstraction of the inspection planning problem which we term Graph Inspection. We give two exact algorithms for this problem, using dynamic programming and integer linear programming. We analyze the performance of these methods, and present multiple approaches to achieve scalability. We demonstrate significant improvement both in path weight and inspection coverage over a state-of-the-art approach on two robotics tasks in simulation, a bridge inspection task by a UAV and a surgical inspection task using a medical robot.
[13] arXiv:2407.00329 (cross-list from cs.CG) [pdf, html, other]: Title: On Line-Separable Weighted Unit-Disk Coverage and Related Problems

Gang Liu, Haitao Wang

Comments: To appear in MFCS 2024

Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)

Given a set $P$ of $n$ points and a set $S$ of $n$ weighted disks in the plane, the disk coverage problem is to compute a subset of disks of smallest total weight such that the union of the disks in the subset covers all points of $P$. The problem is NP-hard. In this paper, we consider a line-separable unit-disk version of the problem where all disks have the same radius and their centers are separated from the points of $P$ by a line $\ell$. We present an $O(n^{3/2}\log^2 n)$ time algorithm for the problem. This improves the previously best work of $O(n^2\log n)$ time. Our result leads to an algorithm of $O(n^{{7}/{2}}\log^2 n)$ time for the halfplane coverage problem (i.e., using $n$ weighted halfplanes to cover $n$ points), an improvement over the previous $O(n^4\log n)$ time solution. If all halfplanes are lower ones, our algorithm runs in $O(n^{{3}/{2}}\log^2 n)$ time, while the previous best algorithm takes $O(n^2\log n)$ time. Using duality, the hitting set problems under the same settings can be solved with similar time complexities.
[14] arXiv:2407.00331 (cross-list from cs.CG) [pdf, html, other]: Title: Unweighted Geometric Hitting Set for Line-Constrained Disks and Related Problems

Gang Liu, Haitao Wang

Comments: To appear in MFCS 2024

Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)

Given a set $P$ of $n$ points and a set $S$ of $m$ disks in the plane, the disk hitting set problem asks for a smallest subset of $P$ such that every disk of $S$ contains at least one point in the subset. The problem is NP-hard. In this paper, we consider a line-constrained version in which all disks have their centers on a line. We present an $O(m\log^2n+(n+m)\log(n+m))$ time algorithm for the problem. This improves the previously best result of $O(m^2\log m+(n+m)\log(n+m))$ time for the weighted case of the problem where every point of $P$ has a weight and the objective is to minimize the total weight of the hitting set. Our algorithm actually solves a more general line-separable problem with a single intersection property: The points of $P$ and the disk centers are separated by a line $\ell$ and the boundary of every two disks intersect at most once on the side of $\ell$ containing $P$.
[15] arXiv:2407.00694 (cross-list from math.CO) [pdf, html, other]: Title: Enumeration of minimal transversals of hypergraphs of bounded VC-dimension

Arnaud Mary

Subjects: Combinatorics (math.CO); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)

We consider the problem of enumerating all minimal transversals (also called minimal hitting sets) of a hypergraph $\mathcal{H}$. An equivalent formulation of this problem known as the \emph{transversal hypergraph} problem (or \emph{hypergraph dualization} problem) is to decide, given two hypergraphs, whether one corresponds to the set of minimal transversals of the other. The existence of a polynomial time algorithm to solve this problem is a long standing open question. In \cite{fredman_complexity_1996}, the authors present the first sub-exponential algorithm to solve the transversal hypergraph problem which runs in quasi-polynomial time, making it unlikely that the problem is (co)NP-complete.
In this paper, we show that when one of the two hypergraphs is of bounded VC-dimension, the transversal hypergraph problem can be solved in polynomial time, or equivalently that if $\mathcal{H}$ is a hypergraph of bounded VC-dimension, then there exists an incremental polynomial time algorithm to enumerate its minimal transversals. This result generalizes most of the previously known polynomial cases in the literature since they almost all consider classes of hypergraphs of bouded VC-dimension. As a consequence, the hypergraph transversal problem is solvable in polynomial time for any class of hypergraphs closed under partial subhypergraphs. We also show that the proposed algorithm runs in quasi-polynomial time in general hypergraphs and runs in polynomial time if the conformality of the hypergraph is bounded, which is one of the few known polynomial cases where the VC-dimension is unbounded.
[16] arXiv:2407.00868 (cross-list from math.PR) [pdf, other]: Title: Sampling from the Continuous Random Energy Model in Total Variation Distance

Holden Lee, Qiang Wu

Subjects: Probability (math.PR); Data Structures and Algorithms (cs.DS)

The continuous random energy model (CREM) is a toy model of spin glasses on $\{0,1\}^N$ that, in the limit, exhibits an infinitely hierarchical correlation structure. We give two polynomial-time algorithms to approximately sample from the Gibbs distribution of the CREM in the high-temperature regime, based on a Markov chain and a sequential sampler. The running time depends algebraically on the desired TV distance and failure probability and exponentially in $(1/g')^{O(1)}$, where $g'$ is the gap to a certain inverse temperature threshold; this contrasts with previous results which only attain $o(N)$ accuracy in KL divergence. If the covariance function $A$ of the CREM is concave, the algorithms work up to the critical threshold $\beta_c$, which is the static phase transition point; moreover, for certain $A$, the algorithms work up to the known algorithmic threshold $\beta_G$ proposed in Addario-Berry and Maillard (2020) for non-trivial sampling guarantees. Our result depends on quantitative bounds for the fluctuation of the partition function and a new contiguity result of the ``tilted" CREM obtained from sampling, which is of independent interest. We also show that the spectral gap is exponentially small with high probability, suggesting that the algebraic dependence is unavoidable with a Markov chain approach.
[17] arXiv:2407.00871 (cross-list from cs.DC) [pdf, other]: Title: A Reexamination of the Communication Bandwidth Cost Analysis of A Parallel Recursive Algorithm for Solving Triangular Systems of Linear Equations

Yuan Tang

Comments: 2 pages, comment on arXiv:1612.01855

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS); Numerical Analysis (math.NA)

This paper presents a reexamination of the research paper titled "Communication-Avoiding Parallel Algorithms for \proc{TRSM}" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify potential issues that require clarification or revision. The problem at hand is the need to address inconsistencies and miscalculations found in the analysis, particularly in the categorization of costs into three scenarios based on the relationship between matrix dimensions and processor count. Our findings contribute to the ongoing discourse in the field and pave the way for further improvements in this area of research.
[18] arXiv:2407.01402 (cross-list from cs.CC) [pdf, html, other]: Title: Superconstant Inapproximability of Decision Tree Learning

Caleb Koch, Carmen Strassle, Li-Yang Tan

Comments: 29 pages, 5 figures, COLT 2024

Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

We consider the task of properly PAC learning decision trees with queries. Recent work of Koch, Strassle, and Tan showed that the strictest version of this task, where the hypothesis tree $T$ is required to be optimally small, is NP-hard. Their work leaves open the question of whether the task remains intractable if $T$ is only required to be close to optimal, say within a factor of 2, rather than exactly optimal.
We answer this affirmatively and show that the task indeed remains NP-hard even if $T$ is allowed to be within any constant factor of optimal. More generally, our result allows for a smooth tradeoff between the hardness assumption and the inapproximability factor. As Koch et al.'s techniques do not appear to be amenable to such a strengthening, we first recover their result with a new and simpler proof, which we couple with a new XOR lemma for decision trees. While there is a large body of work on XOR lemmas for decision trees, our setting necessitates parameters that are extremely sharp, and are not known to be attainable by existing XOR lemmas. Our work also carries new implications for the related problem of Decision Tree Minimization.

[19] arXiv:2211.12441 (replaced) [pdf, html, other]: Title: Lower Bound Techniques in the Comparison-Query Model and Inversion Minimization on Trees

Ivan Hu, Dieter van Melkebeek, Andrew Morgan

Comments: 55 pages, 18 figures, conference version of paper appeared in the Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms

Subjects: Data Structures and Algorithms (cs.DS)

Given a rooted tree and a ranking of its leaves, what is the minimum number of inversions of the leaves that can be attained by ordering the tree? This variation of the problem of counting inversions in arrays originated in mathematical psychology, with the evaluation of the Mann--Whitney statistic for detecting differences between distributions as a special case.
We study the complexity of the problem in the comparison-query model, used for problems like sorting and selection. For many types of trees with $n$ leaves, we establish lower bounds close to the strongest known in the model, namely the lower bound of $\log_2(n!)$ for sorting $n$ items. We show:
(a) $\log_2((\alpha(1-\alpha)n)!) - O(\log n)$ queries are needed whenever the tree has a subtree that contains a fraction $\alpha$ of the leaves. This implies a lower bound of $\log_2((\frac{k}{(k+1)^2}n)!) - O(\log n)$ for trees of degree $k$.
(b) $\log_2(n!) - O(\log n)$ queries are needed in case the tree is binary.
(c) $\log_2(n!) - O(k \log k)$ queries are needed for certain classes of trees of degree $k$, including perfect trees with even $k$.
The lower bounds are obtained by developing two novel techniques for a generic problem $\Pi$ in the comparison-query model and applying them to inversion minimization on trees. Both techniques can be described in terms of the Cayley graph of the symmetric group with adjacent-rank transpositions as the generating set. Consider the subgraph consisting of the edges between vertices with the same value under $\Pi$. We show that the size of any decision tree for $\Pi$ must be at least:
(i) the number of connected components of this subgraph, and
(ii) the factorial of the average degree of the complementary subgraph, divided by $n$.
Lower bounds on query complexity then follow by taking the base-2 logarithm.
[20] arXiv:2308.06629 (replaced) [pdf, html, other]: Title: Optimal FIFO grouping in public transit networks

Patrick Steil

Comments: 1 page, 0 figures

Subjects: Data Structures and Algorithms (cs.DS)

This technical report is about grouping vehicles in public transport into routes so that two vehicles of a route do not overtake each other. We say that such a set of routes satisfies the FIFO property. A natural question is: Given a set of trips, find a minimal FIFO grouping into routes. This question is especially interesting for route planning algorithms since a better route grouping leads to a better runtime.
[21] arXiv:2308.14727 (replaced) [pdf, html, other]: Title: Faster Min-Cost Flow and Approximate Tree Decomposition on Bounded Treewidth Graphs

Sally Dong, Guanghao Ye

Comments: 15 pages, to appear at ESA 2024

Subjects: Data Structures and Algorithms (cs.DS)

We present an algorithm for min-cost flow in graphs with $n$ vertices and $m$ edges, given a tree decomposition of width $\tau$ and size $S$, and polynomially bounded, integral edge capacities and costs, running in $\widetilde{O}(m\sqrt{\tau} + S)$ time. This improves upon the previous fastest algorithm in this setting achieved by the bounded-treewidth linear program solver by [Dong-Lee-Ye,21] and [Gu-Song,22], which runs in $\widetilde{O}(m \tau^{(\omega+1)/2})$ time, where $\omega \approx 2.37$ is the matrix multiplication exponent. Our approach leverages recent advances in structured linear program solvers and robust interior point methods (IPM). For general graphs where treewidth is trivially bounded by $n$, the algorithm runs in $\widetilde{O}(m \sqrt n)$ time, which is the best-known result without using the Lee-Sidford barrier or $\ell_1$ IPM, demonstrating the surprising power of robust interior point methods.
As a corollary, we obtain a $\widetilde{O}(\operatorname{tw}^3 \cdot m)$ time algorithm to compute a tree decomposition of width $O(\operatorname{tw}\cdot \log(n))$, given a graph with $m$ edges.
[22] arXiv:2309.09134 (replaced) [pdf, html, other]: Title: Total Variation Distance Meets Probabilistic Inference

Arnab Bhattacharyya, Sutanu Gayen, Kuldeep S. Meel, Dimitrios Myrisiotis, A. Pavan, N. V. Vinodchandran

Comments: 25 pages. This work has been accepted for presentation at the International Conference on Machine Learning (ICML) 2024

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Discrete Mathematics (cs.DM); Machine Learning (cs.LG)

In this paper, we establish a novel connection between total variation (TV) distance estimation and probabilistic inference. In particular, we present an efficient, structure-preserving reduction from relative approximation of TV distance to probabilistic inference over directed graphical models. This reduction leads to a fully polynomial randomized approximation scheme (FPRAS) for estimating TV distances between same-structure distributions over any class of Bayes nets for which there is an efficient probabilistic inference algorithm. In particular, it leads to an FPRAS for estimating TV distances between distributions that are defined over a common Bayes net of small treewidth. Prior to this work, such approximation schemes only existed for estimating TV distances between product distributions. Our approach employs a new notion of $partial$ couplings of high-dimensional distributions, which might be of independent interest.
[23] arXiv:2312.03681 (replaced) [pdf, html, other]: Title: Testing Connectedness of Images

Piotr Berman, Meiram Murzabulatov, Sofya Raskhodnikova, Dragos-Florian Ristache

Subjects: Data Structures and Algorithms (cs.DS)

We investigate algorithms for testing whether an image is connected. Given a proximity parameter $\epsilon\in(0,1)$ and query access to a black-and-white image represented by an $n\times n$ matrix of Boolean pixel values, a (1-sided error) connectedness tester accepts if the image is connected and rejects with probability at least 2/3 if the image is $\epsilon$-far from connected. We show that connectedness can be tested nonadaptively with $O(\frac 1{\epsilon^2})$ queries and adaptively with $O(\frac{1}{\epsilon^{3/2}} \sqrt{\log\frac{1}{\epsilon}})$ queries. The best connectedness tester to date, by Berman, Raskhodnikova, and Yaroslavtsev (STOC 2014) had query complexity $O(\frac 1{\epsilon^2}\log \frac 1{\epsilon})$ and was adaptive. We also prove that every nonadaptive, 1-sided error tester for connectedness must make $\Omega(\frac 1\epsilon\log \frac 1\epsilon)$ queries.
[24] arXiv:2312.08489 (replaced) [pdf, html, other]: Title: Connectivity Oracles for Predictable Vertex Failures

Bingbing Hu, Evangelos Kosinas, Adam Polak

Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

The problem of designing connectivity oracles supporting vertex failures is one of the basic data structures problems for undirected graphs. It is already well understood: previous works [Duan--Pettie STOC'10; Long--Saranurak FOCS'22] achieve query time linear in the number of failed vertices, and it is conditionally optimal as long as we require preprocessing time polynomial in the size of the graph and update time polynomial in the number of failed vertices.
We revisit this problem in the paradigm of algorithms with predictions: we ask if the query time can be improved if the set of failed vertices can be predicted beforehand up to a small number of errors. More specifically, we design a data structure that, given a graph $G=(V,E)$ and a set of vertices predicted to fail $\widehat{D} \subseteq V$ of size $d=|\widehat{D}|$, preprocesses it in time $\tilde{O}(d|E|)$ and then can receive an update given as the symmetric difference between the predicted and the actual set of failed vertices $\widehat{D} \triangle D = (\widehat{D} \setminus D) \cup (D \setminus \widehat{D})$ of size $\eta = |\widehat{D} \triangle D|$, process it in time $\tilde{O}(\eta^4)$, and after that answer connectivity queries in $G \setminus D$ in time $O(\eta)$. Viewed from another perspective, our data structure provides an improvement over the state of the art for the \emph{fully dynamic subgraph connectivity problem} in the \emph{sensitivity setting} [Henzinger--Neumann ESA'16].
We argue that the preprocessing time and query time of our data structure are conditionally optimal under standard fine-grained complexity assumptions.
[25] arXiv:2402.10015 (replaced) [pdf, other]: Title: A Piecewise Approach for the Analysis of Exact Algorithms

Katie Clinch, Serge Gaspers, Zixu He, Abdallah Saffidine, Tiankuang Zhang

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

To analyze the worst-case running time of branching algorithms, the majority of work in exponential time algorithms focuses on designing complicated branching rules over developing better analysis methods for simple algorithms. In the mid-$2000$s, Fomin et al. [2005] introduced measure & conquer, an advanced general analysis method, sparking widespread adoption for obtaining tighter worst-case running time upper bounds for many fundamental NP-complete problems. Yet, much potential in this direction remains untapped, as most subsequent work applied it without further advancement. Motivated by this, we present piecewise analysis, a new general method that analyzes the running time of branching algorithms. Our approach is to define a similarity ratio that divides instances into groups and then analyze the running time within each group separately. The similarity ratio is a scale between two parameters of an instance I. Instead of relying on a single measure and a single analysis for the whole instance space, our method allows to take advantage of different intrinsic properties of instances with different similarity ratios. To showcase its potential, we reanalyze two $17$-year-old algorithms from Fomin et al. [2007] that solve $4$-Coloring and #$3$-Coloring respectively. The original analysis in their paper gave running times of $O(1.7272^n)$ and $O(1.6262^n)$ respectively for these algorithms, our analysis improves these running times to $O(1.7207^n)$ and $O(1.6225^n)$.
[26] arXiv:2403.02008 (replaced) [pdf, html, other]: Title: How to Find Long Maximal Exact Matches and Ignore Short Ones

Travis Gagie

Subjects: Data Structures and Algorithms (cs.DS)

Finding maximal exact matches (MEMs) between strings is an important task in bioinformatics, but it is becoming increasingly challenging as geneticists switch to pangenomic references. Fortunately, we are usually interested only in the relatively few MEMs that are longer than we would expect by chance. In this paper we show that under reasonable assumptions we can find all MEMs of length at least $L$ between a pattern of length $m$ and a text of length $n$ in $O (m)$ time plus extra $O (\log n)$ time only for each MEM of length at least nearly $L$ using a compact index for the text, suitable for pangenomics.
[27] arXiv:2403.08394 (replaced) [pdf, html, other]: Title: Worst-Case to Expander-Case Reductions: Derandomized and Generalized

Amir Abboud, Nathan Wallheimer

Comments: Full version of a paper to appear at ESA 2024

Subjects: Data Structures and Algorithms (cs.DS)

A recent paper by Abboud and Wallheimer [ITCS 2023] presents self-reductions for various fundamental graph problems, which transform worst-case instances to expanders, thus proving that the complexity remains unchanged if the input is assumed to be an expander. An interesting corollary of their self-reductions is that if some problem admits such reduction, then the popular algorithmic paradigm based on expander-decompositions is useless against it. In this paper, we improve their core gadget, which augments a graph to make it an expander while retaining its important structure. Our new core construction has the benefit of being simple to analyze and generalize while obtaining the following results:
1. A derandomization of the self-reductions, showing that the equivalence between worst-case and expander-case holds even for deterministic algorithms, and ruling out the use of expander-decompositions as a derandomization tool.
2. An extension of the results to other models of computation, such as the Fully Dynamic model and the Congested Clique model. In the former, we either improve or provide an alternative approach to some recent hardness results for dynamic expander graphs by Henzinger, Paz, and Sricharan [ESA 2022].
In addition, we continue this line of research by designing new self-reductions for more problems, such as Max-Cut and dynamic Densest Subgraph, and demonstrating that the core gadget can be utilized to lift lower bounds based on the OMv Conjecture to expanders.
[28] arXiv:2403.17480 (replaced) [pdf, html, other]: Title: Capacity Provisioning Motivated Online Non-Convex Optimization Problem with Memory and Switching Cost

Rahul Vaze, Jayakrishnan Nair

Subjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

An online non-convex optimization problem is considered where the goal is to minimize the flow time (total delay) of a set of jobs by modulating the number of active servers, but with a switching cost associated with changing the number of active servers over time. Each job can be processed by at most one fixed speed server at any time. Compared to the usual online convex optimization (OCO) problem with switching cost, the objective function considered is non-convex and more importantly, at each time, it depends on all past decisions and not just the present one. Both worst-case and stochastic inputs are considered; for both cases, competitive algorithms are derived.
[29] arXiv:2404.05271 (replaced) [pdf, html, other]: Title: Scheduling Multi-Server Jobs is Not Easy

Rahul Vaze

Subjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)

The problem of online scheduling of multi-server jobs is considered, where there are a total of $K$ servers, and each job requires concurrent service from multiple servers for it to be processed. Each job on its arrival reveals its processing time, the number of servers from which it needs concurrent service and an online algorithm has to make scheduling decisions using only causal information, with the goal of minimizing the response/flow time. The worst case input model is considered and the performance metric is the competitive ratio. For the case, when all job processing time (sizes) are the same, we show that the competitive ratio of any deterministic/randomized algorithm is at least $\Omega(K)$ and propose an online algorithm whose competitive ratio is at most $K+1$. With equal job sizes, we also consider the resource augmentation regime where an online algorithm has access to more servers than an optimal offline algorithm. With resource augmentation, we propose a simple algorithm and show that it has a competitive ratio of $1$ when provided with $2K$ servers with respect to an optimal offline algorithm with $K$ servers. With unequal job sizes, we propose an online algorithm whose competitive ratio is at most $2K \log (K w_{\max})$, where $w_{\max}$ is the maximum size of any job.
[30] arXiv:2404.05681 (replaced) [pdf, html, other]: Title: Even Faster Knapsack via Rectangular Monotone Min-Plus Convolution and Balancing

Karl Bringmann, Anita Dürr, Adam Polak

Subjects: Data Structures and Algorithms (cs.DS)

We present a pseudopolynomial-time algorithm for the Knapsack problem that has running time $\widetilde{O}(n + t\sqrt{p_{\max}})$, where $n$ is the number of items, $t$ is the knapsack capacity, and $p_{\max}$ is the maximum item profit. This improves over the $\widetilde{O}(n + t \, p_{\max})$-time algorithm based on the convolution and prediction technique by Bateni et al.~(STOC 2018). Moreover, we give some evidence, based on a strengthening of the Min-Plus Convolution Hypothesis, that our running time might be optimal.
Our algorithm uses two new technical tools, which might be of independent interest. First, we generalize the $\widetilde{O}(n^{1.5})$-time algorithm for bounded monotone min-plus convolution by Chi et al.~(STOC 2022) to the \emph{rectangular} case where the range of entries can be different from the sequence length. Second, we give a reduction from general knapsack instances to \emph{balanced} instances, where all items have nearly the same profit-to-weight ratio, up to a constant factor.
Using these techniques, we can also obtain algorithms that run in time $\widetilde{O}(n + OPT\sqrt{w_{\max}})$, $\widetilde{O}(n + (nw_{\max}p_{\max})^{1/3}t^{2/3})$, and $\widetilde{O}(n + (nw_{\max}p_{\max})^{1/3} OPT^{2/3})$, where $OPT$ is the optimal total profit and $w_{\max}$ is the maximum item weight.
[31] arXiv:2404.10023 (replaced) [pdf, html, other]: Title: Parameterized Algorithms for Editing to Uniform Cluster Graph

Ajinkya Gaikwad, Hitendra Kumar, Soumen Maity

Subjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)

Given a graph $G=(V,E)$ and an integer $k\in \mathbb{N}$, we investigate the 2-Eigenvalue Vertex Deletion (2-EVD) problem. The objective is to remove at most $k$ vertices such that the adjacency matrix of the resulting graph has at most two eigenvalues. It is established that the adjacency matrix of a graph has at most two eigenvalues if and only if the graph is a collection of equal-sized cliques. Thus, the 2-Eigenvalue Vertex Deletion amounts to removing a set of at most $k$ vertices to transform the graph into a collection of equal-sized cliques. The 2-Eigenvalue Edge Editing (2-EEE), 2-Eigenvalue Edge Deletion (2-EED) and 2-Eigenvalue Edge Addition (2-EEA) problems are defined analogously. We present a kernel of size $\mathcal{O}(k^{3})$ for $2$-EVD, along with an FPT algorithm with a running time of $\mathcal{O}^{*}(2^{k})$. For the problem $2$-EEE, we provide a kernel of size $\mathcal{O}(k^{2})$. Additionally, we present linear kernels of size $5k$ and $6k$ for $2$-EEA and $2$-EED respectively. For the $2$-EED, we also construct an algorithm with running time $\mathcal{O}^{*}(1.47^{k})$ . These results address open questions posed by Misra et al. (ISAAC 2023) regarding the complexity of these problems when parameterized by the solution size.
[32] arXiv:2404.18783 (replaced) [pdf, html, other]: Title: Improved bounds for group testing in arbitrary hypergraphs

Annalisa De Bonis

Comments: arXiv admin note: text overlap with arXiv:2307.09608

Subjects: Data Structures and Algorithms (cs.DS)

Recent papers initiated the study of a generalization of group testing where the potentially contaminated sets are the members of a given hypergraph F=(V,E). This generalization finds application in contexts where contaminations can be conditioned by some kinds of social and geographical clusterings. The paper focuses on few-stage group testing algorithms, i.e., slightly adaptive algorithms where tests are performed in stages and all tests performed in the same stage should be decided at the very beginning of the stage. In particular, the paper presents the first two-stage algorithm that uses o(dlog|E|) tests for general hypergraphs with hyperedges of size at most d, and a three-stage algorithm that improves by a d^{1/6} factor on the number of tests of the best known three-stage algorithm. These algorithms are special cases of an s-stage algorithm designed for an arbitrary positive integer s<= d. The design of this algorithm resort to a new non-adaptive algorithm (one-stage algorithm), i.e., an algorithm where all tests must be decided beforehand. Further, we derive a lower bound for non-adaptive group testing. For E sufficiently large, the lower bound is very close to the upper bound on the number of tests of the best non-adaptive group testing algorithm known in the literature, and it is the first lower bound that improves on the information theoretic lower bound Omega(log |E|).
[33] arXiv:2405.09338 (replaced) [pdf, html, other]: Title: Interval Selection in Sliding Windows

Cezar-Mihail Alexandru, Christian Konrad

Comments: 22 pages, 6 figures

Subjects: Data Structures and Algorithms (cs.DS)

We initiate the study of the Interval Selection problem in the (streaming) sliding window model of computation.
In this problem, an algorithm receives a potentially infinite stream of intervals on the line, and the objective is to maintain at every moment an approximation to a largest possible subset of disjoint intervals among the $L$ most recent intervals, for some integer $L$.
We give the following results:
- In the unit-length intervals case, we give a $2$-approximation sliding window algorithm with space $\tilde{\mathrm{O}}(|OPT|)$, and we show that any sliding window algorithm that computes a $(2-\varepsilon)$-approximation requires space $\Omega(L)$, for any $\varepsilon > 0$.
- In the arbitrary-length case, we give a $(\frac{11}{3}+\varepsilon)$-approximation sliding window algorithm with space $\tilde{\mathrm{O}}(|OPT|)$, for any constant $\varepsilon > 0$, which constitutes our main result.
We also show that space $\Omega(L)$ is needed for algorithms that compute a $(2.5-\varepsilon)$-approximation, for any $\varepsilon > 0$.
Our main technical contribution is an improvement over the smooth histogram technique, which consists of running independent copies of a traditional streaming algorithm with different start times. By employing the one-pass $2$-approximation streaming algorithm by Cabello and Pérez-Lantero [Theor. Comput. Sci. '17] for Interval Selection on arbitrary-length intervals as the underlying algorithm, the smooth histogram technique immediately yields a $(4+\varepsilon)$-approximation in this setting. Our improvement is obtained by forwarding the structure of the intervals identified in a run to the subsequent run, which constrains the shape of an optimal solution and allows us to target optimal intervals differently.
[34] arXiv:2206.05248 (replaced) [pdf, html, other]: Title: Accelerated Algorithms for Constrained Nonconvex-Nonconcave Min-Max Optimization and Comonotone Inclusion

Yang Cai, Argyris Oikonomou, Weiqiang Zheng

Comments: Accepted to ICML 2024

Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

We study constrained comonotone min-max optimization, a structured class of nonconvex-nonconcave min-max optimization problems, and their generalization to comonotone inclusion. In our first contribution, we extend the Extra Anchored Gradient (EAG) algorithm, originally proposed by Yoon and Ryu (2021) for unconstrained min-max optimization, to constrained comonotone min-max optimization and comonotone inclusion, achieving an optimal convergence rate of $O\left(\frac{1}{T}\right)$ among all first-order methods. Additionally, we prove that the algorithm's iterations converge to a point in the solution set. In our second contribution, we extend the Fast Extra Gradient (FEG) algorithm, as developed by Lee and Kim (2021), to constrained comonotone min-max optimization and comonotone inclusion, achieving the same $O\left(\frac{1}{T}\right)$ convergence rate. This rate is applicable to the broadest set of comonotone inclusion problems yet studied in the literature. Our analyses are based on simple potential function arguments, which might be useful for analyzing other accelerated algorithms.
[35] arXiv:2211.03997 (replaced) [pdf, html, other]: Title: Online Decision Making with Nonconvex Local and Convex Global Constraints

Rui Chen, Oktay Gunluk, Andrea Lodi, Guanyi Wang

Subjects: Optimization and Control (math.OC); Data Structures and Algorithms (cs.DS)

We study the online decision making problem (ODMP) as a natural generalization of online linear programming. In ODMP, a single decision maker undertakes a sequence of decisions over $T$ time steps. At each time step, the decision maker makes a locally feasible decision based on information available up to that point. The objective is to maximize the accumulated reward while satisfying some convex global constraints called goal constraints. The decision made at each step results in an $m$-dimensional vector that represents the contribution of this local decision to the goal constraints. In the online setting, these goal constraints are soft constraints that can be violated moderately. To handle potential nonconvexity and nonlinearity in ODMP, we propose a Fenchel dual-based online algorithm. At each time step, the algorithm requires solving a potentially nonconvex optimization problem over the local feasible set and a convex optimization problem over the goal set. Under certain stochastic input models, we show that the algorithm achieves $O(\sqrt{mT})$ goal constraint violation deterministically, and $\tilde{O}(\sqrt{mT})$ regret in expected reward. Numerical experiments on an online knapsack problem and an assortment optimization problem are conducted to demonstrate the potential of our proposed online algorithm.
[36] arXiv:2305.05074 (replaced) [pdf, html, other]: Title: Autumn: A Scalable Read Optimized LSM-tree based Key-Value Stores with Fast Point and Range Read Speed

Fuheng Zhao, Zach Miller, Leron Reznikov, Divyakant Agrawal, Amr El Abbadi

Subjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Information Retrieval (cs.IR)

The Log Structured Merge Trees (LSM-tree) based key-value stores are widely used in many storage systems to support a variety of operations such as updates, point reads, and range reads. Traditionally, LSM-tree's merge policy organizes data into multiple levels of exponentially increasing capacity to support high-speed writes. However, we contend that the traditional merge policies are not optimized for reads. In this work, we present Autumn, a scalable and read optimized LSM-tree based key-value stores with minimal point and range read cost. The key idea in improving the read performance is to dynamically adjust the capacity ratio between two adjacent levels as more data are stored. As a result, smaller levels gradually increase their capacities and merge more often. In particular, the point and range read cost improves from the previous best known $O(logN)$ complexity to $O(\sqrt{logN})$ in Autumn by applying the novel Garnering merge policy. While Garnering merge policy optimizes for both point reads and range reads, it maintains high performance for updates. Moreover, to further improve the update costs, Autumn uses a small amount of bounded space of DRAM to pin/keep the first level of LSM-tree. We implemented Autumn on top of LevelDB and experimentally showcases the gain in performance for real world workloads.
[37] arXiv:2402.03358 (replaced) [pdf, html, other]: Title: A Comprehensive Survey on Graph Reduction: Sparsification, Coarsening, and Condensation

Mohammad Hashemi, Shengbo Gong, Juntong Ni, Wenqi Fan, B. Aditya Prakash, Wei Jin

Comments: Accepted by IJCAI 2024 (This ArXiv version is a long version of our IJCAI paper)

Subjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)

Many real-world datasets can be naturally represented as graphs, spanning a wide range of domains. However, the increasing complexity and size of graph datasets present significant challenges for analysis and computation. In response, graph reduction, or graph summarization, has gained prominence for simplifying large graphs while preserving essential properties. In this survey, we aim to provide a comprehensive understanding of graph reduction methods, including graph sparsification, graph coarsening, and graph condensation. Specifically, we establish a unified definition for these methods and introduce a hierarchical taxonomy to categorize the challenges they address. Our survey then systematically reviews the technical details of these methods and emphasizes their practical applications across diverse scenarios. Furthermore, we outline critical research directions to ensure the continued effectiveness of graph reduction techniques, as well as provide a comprehensive paper list at \url{this https URL}. We hope this survey will bridge literature gaps and propel the advancement of this promising field.
[38] arXiv:2404.02673 (replaced) [pdf, html, other]: Title: History Trees and Their Applications

Giovanni Viglietta

Comments: 20 pages, 10 figures

Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)

In the theoretical study of distributed communication networks, "history trees" are a discrete structure that naturally models the concept that anonymous agents become distinguishable upon receiving different sets of messages from neighboring agents. By conveniently organizing temporal information in a systematic manner, history trees have been instrumental in the development of optimal deterministic algorithms for networks that are both anonymous and dynamically evolving.
This note provides an accessible introduction to history trees, drawing comparisons with more traditional structures found in existing literature and reviewing the latest advancements in the applications of history trees, especially within dynamic networks. Furthermore, it expands the theoretical framework of history trees in new directions, also highlighting several open problems for further investigation.

Total of 38 entries

Showing up to 2000 entries per page: fewer | more | all

Data Structures and Algorithms

New submissions for Tuesday, 2 July 2024 (showing 10 of 10 entries )

Cross submissions for Tuesday, 2 July 2024 (showing 8 of 8 entries )

Replacement submissions for Tuesday, 2 July 2024 (showing 20 of 20 entries )