Data Structures and Algorithms
See recent articles
Showing new listings for Friday, 27 September 2024
- [1] arXiv:2409.17250 [pdf, other]
-
Title: Kernelization Complexity of Solution Discovery ProblemsMario Grobler, Stephanie Maaz, Amer E. Mouawad, Naomi Nishimura, Vijayaragunathan Ramamoorthi, Sebastian SiebertzSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC); Combinatorics (math.CO)
In the solution discovery variant of a vertex (edge) subset problem $\Pi$ on graphs, we are given an initial configuration of tokens on the vertices (edges) of an input graph $G$ together with a budget $b$. The question is whether we can transform this configuration into a feasible solution of $\Pi$ on $G$ with at most $b$ modification steps. We consider the token sliding variant of the solution discovery framework, where each modification step consists of sliding a token to an adjacent vertex (edge). The framework of solution discovery was recently introduced by Fellows et al. [Fellows et al., ECAI 2023] and for many solution discovery problems the classical as well as the parameterized complexity has been established. In this work, we study the kernelization complexity of the solution discovery variants of Vertex Cover, Independent Set, Dominating Set, Shortest Path, Matching, and Vertex Cut with respect to the parameters number of tokens $k$, discovery budget $b$, as well as structural parameters such as pathwidth.
- [2] arXiv:2409.17623 [pdf, html, other]
-
Title: Fully Dynamic Graph Algorithms with Edge Differential PrivacyComments: 30 pages, 3 figuresSubjects: Data Structures and Algorithms (cs.DS); Cryptography and Security (cs.CR)
We study differentially private algorithms for analyzing graphs in the challenging setting of continual release with fully dynamic updates, where edges are inserted and deleted over time, and the algorithm is required to update the solution at every time step. Previous work has presented differentially private algorithms for many graph problems that can handle insertions only or deletions only (called partially dynamic algorithms) and obtained some hardness results for the fully dynamic setting. The only algorithms in the latter setting were for the edge count, given by Fichtenberger, Henzinger, and Ost (ESA 21), and for releasing the values of all graph cuts, given by Fichtenberger, Henzinger, and Upadhyay (ICML 23). We provide the first differentially private and fully dynamic graph algorithms for several other fundamental graph statistics (including the triangle count, the number of connected components, the size of the maximum matching, and the degree histogram), analyze their error and show strong lower bounds on the error for all algorithms in this setting. We study two variants of edge differential privacy for fully dynamic graph algorithms: event-level and item-level. We give upper and lower bounds on the error of both event-level and item-level fully dynamic algorithms for several fundamental graph problems. No fully dynamic algorithms that are private at the item-level (the more stringent of the two notions) were known before. In the case of item-level privacy, for several problems, our algorithms match our lower bounds.
- [3] arXiv:2409.17715 [pdf, html, other]
-
Title: Optimal Sensitivity Oracle for Steiner MincutSubjects: Data Structures and Algorithms (cs.DS)
Let $G=(V,E)$ be an undirected weighted graph on $n=|V|$ vertices and $S\subseteq V$ be a Steiner set. Steiner mincut is a well-studied concept, which provides a generalization to both (s,t)-mincut (when $|S|=2$) and global mincut (when $|S|=n$). Here, we address the problem of designing a compact data structure that can efficiently report a Steiner mincut and its capacity after the failure of any edge in $G$; such a data structure is known as a \textit{Sensitivity Oracle} for Steiner mincut.
In the area of minimum cuts, although many Sensitivity Oracles have been designed in unweighted graphs, however, in weighted graphs, Sensitivity Oracles exist only for (s,t)-mincut [Annals of Operations Research 1991, NETWORKS 2019, ICALP 2024], which is just a special case of Steiner mincut. Here, we generalize this result to any arbitrary set $S\subseteq V$.
1. Sensitivity Oracle: Assuming the capacity of every edge is known,
a. there is an ${\mathcal O}(n)$ space data structure that can report the capacity of Steiner mincut in ${\mathcal O}(1)$ time and
b. there is an ${\mathcal O}(n(n-|S|+1))$ space data structure that can report a Steiner mincut in ${\mathcal O}(n)$ time after the failure of any edge in $G$.
2. Lower Bound: We show that any data structure that, after the failure of any edge, can report a Steiner mincut or its capacity must occupy $\Omega(n^2)$ bits of space in the worst case, irrespective of the size of the Steiner set.
The lower bound in (2) shows that the assumption in (1) is essential to break the $\Omega(n^2)$ lower bound on space. For $|S|=n-k$ for any constant $k\ge 0$, it occupies only ${\mathcal O}(n)$ space. So, we also present the first Sensitivity Oracle occupying ${\mathcal O}(n)$ space for global mincut. - [4] arXiv:2409.18036 [pdf, html, other]
-
Title: Optimal Dynamic Parameterized Subset SamplingComments: 29 pages, 10 figures, to be published in PODS25Subjects: Data Structures and Algorithms (cs.DS); Databases (cs.DB)
In this paper, we study the Dynamic Parameterized Subset Sampling (DPSS) problem in the Word RAM model. In DPSS, the input is a set,~$S$, of~$n$ items, where each item,~$x$, has a non-negative integer weight,~$w(x)$. Given a pair of query parameters, $(\alpha, \beta)$, each of which is a non-negative rational number, a parameterized subset sampling query on~$S$ seeks to return a subset $T \subseteq S$ such that each item $x \in S$ is selected in~$T$, independently, with probability $p_x(\alpha, \beta) = \min \left\{\frac{w(x)}{\alpha \sum_{x\in S} w(x)+\beta}, 1 \right\}$. More specifically, the DPSS problem is defined in a dynamic setting, where the item set,~$S$, can be updated with insertions of new items or deletions of existing items. Our first main result is an optimal algorithm for solving the DPSS problem, which achieves~$O(n)$ pre-processing time, $O(1+\mu_S(\alpha,\beta))$ expected time for each query parameterized by $(\alpha, \beta)$, given on-the-fly, and $O(1)$ time for each update; here, $\mu_S(\alpha,\beta)$ is the expected size of the query result. At all times, the worst-case space consumption of our algorithm is linear in the current number of items in~$S$. Our second main contribution is a hardness result for the DPSS problem when the item weights are~$O(1)$-word float numbers, rather than integers. Specifically, we reduce Integer Sorting to the deletion-only DPSS problem with float item weights. Our reduction implies that an optimal algorithm for deletion-only DPSS with float item weights (achieving all the same bounds as aforementioned) implies an optimal algorithm for Integer Sorting. The latter remains an important open problem. Last but not least, a key technical ingredient for our first main result is an efficient algorithm for generating Truncated Geometric random variates in $O(1)$ expected time in the Word RAM model.
New submissions (showing 4 of 4 entries)
- [5] arXiv:2409.17329 (cross-list from cs.DB) [pdf, html, other]
-
Title: Dynamic direct access of MSO query evaluation over stringsSubjects: Databases (cs.DB); Data Structures and Algorithms (cs.DS); Formal Languages and Automata Theory (cs.FL)
We study the problem of evaluating a Monadic Second Order (MSO) query over strings under updates in the setting of direct access. We present an algorithm that, given an MSO query with first-order free variables represented by an unambiguous variable-set automaton $\mathcal{A}$ with state set $Q$ and variables $X$ and a string $s$, computes a data structure in time $\mathcal{O}(|Q|^\omega\cdot |X|^2 \cdot |s|)$ and, then, given an index $i$ retrieves, using the data structure, the $i$-th output of the evaluation of $\mathcal{A}$ over $s$ in time $\mathcal{O}(|Q|^\omega \cdot |X|^3 \cdot \log(|s|)^2)$ where $\omega$ is the exponent for matrix multiplication. Ours is the first efficient direct access algorithm for MSO query evaluation over strings; such algorithms so far had only been studied for first-order queries and conjunctive queries over relational data.
Our algorithm gives the answers in lexicographic order where, in contrast to the setting of conjunctive queries, the order between variables can be freely chosen by the user without degrading the runtime. Moreover, our data structure can be updated efficiently after changes to the input string, allowing more powerful updates than in the enumeration literature, e.g.~efficient deletion of substrings, concatenation and splitting of strings, and cut-and-paste operations. Our approach combines a matrix representation of MSO queries and a novel data structure for dynamic word problems over semi-groups which yields an overall algorithm that is elegant and easy to formulate. - [6] arXiv:2409.17424 (cross-list from cs.IR) [pdf, html, other]
-
Title: Results of the Big ANN: NeurIPS'23 competitionHarsha Vardhan Simhadri, Martin Aumüller, Amir Ingber, Matthijs Douze, George Williams, Magdalen Dobson Manohar, Dmitry Baranchuk, Edo Liberty, Frank Liu, Ben Landrum, Mazin Karjikar, Laxman Dhulipala, Meng Chen, Yue Chen, Rui Ma, Kai Zhang, Yuzheng Cai, Jiayang Shi, Yizhuo Chen, Weiguo Zheng, Zihao Wan, Jie Yin, Ben HuangComments: Code: this https URLSubjects: Information Retrieval (cs.IR); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Performance (cs.PF)
The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect the growing complexity and diversity of workloads. Unlike prior challenges that emphasized scaling up classical ANN search ~\cite{DBLP:conf/nips/SimhadriWADBBCH21}, this competition addressed filtered search, out-of-distribution data, sparse and streaming variants of ANNS. Participants developed and submitted innovative solutions that were evaluated on new standard datasets with constrained computational resources. The results showcased significant improvements in search accuracy and efficiency over industry-standard baselines, with notable contributions from both academic and industrial teams. This paper summarizes the competition tracks, datasets, evaluation metrics, and the innovative approaches of the top-performing submissions, providing insights into the current advancements and future directions in the field of approximate nearest neighbor search.
- [7] arXiv:2409.17567 (cross-list from cs.LG) [pdf, other]
-
Title: Derandomizing Multi-Distribution LearningSubjects: Machine Learning (cs.LG); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Statistics Theory (math.ST)
Multi-distribution or collaborative learning involves learning a single predictor that works well across multiple data distributions, using samples from each during training. Recent research on multi-distribution learning, focusing on binary loss and finite VC dimension classes, has shown near-optimal sample complexity that is achieved with oracle efficient algorithms. That is, these algorithms are computationally efficient given an efficient ERM for the class. Unlike in classical PAC learning, where the optimal sample complexity is achieved with deterministic predictors, current multi-distribution learning algorithms output randomized predictors. This raises the question: can these algorithms be derandomized to produce a deterministic predictor for multiple distributions? Through a reduction to discrepancy minimization, we show that derandomizing multi-distribution learning is computationally hard, even when ERM is computationally efficient. On the positive side, we identify a structural condition enabling an efficient black-box reduction, converting existing randomized multi-distribution predictors into deterministic ones.
- [8] arXiv:2409.17831 (cross-list from cs.CC) [pdf, other]
-
Title: Asymptotically Optimal Hardness for $k$-Set Packing and $k$-Matroid IntersectionComments: 14 pagesSubjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
For any $\varepsilon > 0$, we prove that $k$-Dimensional Matching is hard to approximate within a factor of $k/(12 + \varepsilon)$ for large $k$ unless $\textsf{NP} \subseteq \textsf{BPP}$. Listed in Karp's 21 $\textsf{NP}$-complete problems, $k$-Dimensional Matching is a benchmark computational complexity problem which we find as a special case of many constrained optimization problems over independence systems including: $k$-Set Packing, $k$-Matroid Intersection, and Matroid $k$-Parity. For all the aforementioned problems, the best known lower bound was a $\Omega(k /\log(k))$-hardness by Hazan, Safra, and Schwartz. In contrast, state-of-the-art algorithms achieved an approximation of $O(k)$. Our result narrows down this gap to a constant and thus provides a rationale for the observed algorithmic difficulties. The crux of our result hinges on a novel approximation preserving gadget from $R$-degree bounded $k$-CSPs over alphabet size $R$ to $kR$-Dimensional Matching. Along the way, we prove that $R$-degree bounded $k$-CSPs over alphabet size $R$ are hard to approximate within a factor $\Omega_k(R)$ using known randomised sparsification methods for CSPs.
- [9] arXiv:2409.17905 (cross-list from cs.DM) [pdf, html, other]
-
Title: Rotation distance using flowsSubjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
Splay trees are a simple and efficient dynamic data structure, invented by Sleator and Tarjan. The basic primitive for transforming a binary tree in this scheme is a rotation. Sleator, Tarjan, and Thurston proved that the maximum rotation distance between trees with n internal nodes is exactly 2n-6 for trees with n internal nodes (where n is larger than some constant). The proof of the upper bound is easy but the proof of the lower bound, remarkably, uses sophisticated arguments based on calculating hyperbolic volumes. We give an elementary proof of the same result. The main interest of the paper lies in the method, which is new. It basically relies on a potential function argument, similar to many amortized analyses. However, the potential of a tree is not defined explicitly, but by constructing an instance of a flow problem and using the max-flow min-cut theorem.
Cross submissions (showing 5 of 5 entries)
- [10] arXiv:2206.13773 (replaced) [pdf, other]
-
Title: On Relaxation of Dominant SetsComments: proof is flawedSubjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
In a graph $G = (V,E)$, a k-ruling set $S$ is one in which all vertices $V$ \ $S$ are at most $k$ distance from $S$. Finding a minimum k-ruling set is intrinsically linked to the minimum dominating set problem and maximal independent set problem, which have been extensively studied in graph theory. This paper presents the first known algorithm for solving all k-ruling set problems in conjunction with known minimum dominating set algorithms at only additional polynomial time cost compared to a minimum dominating set. The algorithm further succeeds for $(\alpha, \alpha - 1)$ ruling sets in which $\alpha > 1$, for which constraints exist on the proximity of vertices v $\in S$. This secondary application instead works in conjunction with maximal independent set algorithms.
- [11] arXiv:2309.05554 (replaced) [pdf, html, other]
-
Title: Concentration of Submodular Functions and Read-k Families Under Negative DependenceSubjects: Data Structures and Algorithms (cs.DS)
We study the question of whether submodular functions of random variables satisfying various notions of negative dependence satisfy Chernoff-like concentration inequalities. We prove such a concentration inequality for the lower tail when the random variables satisfy negative association or negative regression, partially resolving an open problem raised in (Qiu and Singla [QS22]). Previous work showed such concentration results for random variables that come from specific dependent-rounding algorithms (Chekuri, Vondrak, and Zenklusen [CVZ10] and Harvey and Olver [HO14]). We discuss some applications of our results to combinatorial optimization and beyond. We also show applications to the concentration of read-k families [Gav+15] under certain forms of negative dependence; we further show a simplified proof of the entropy-method approach of [Gav+15].
- [12] arXiv:2309.14730 (replaced) [pdf, html, other]
-
Title: On the Advice Complexity of Online Unit ClusteringSubjects: Data Structures and Algorithms (cs.DS)
In online unit clustering, points of a metric space arriving one by one must be partitioned into clusters of diameter at most 1, where the cost is the number of clusters.
This paper gives linear upper and lower bounds on the advice complexity of 1-competitive online unit clustering algorithms, in terms of the number of points in $\mathbb{R}^d$ and $\mathbb{Z}^d$. - [13] arXiv:2310.19594 (replaced) [pdf, html, other]
-
Title: Superpolynomial smoothed complexity of 3-FLIP in Local Max-CutComments: 19 pages, 3 figures, replaced section 3.1 by a known resultSubjects: Data Structures and Algorithms (cs.DS); Computational Complexity (cs.CC)
Local search algorithms for NP-hard problems such as Max-Cut frequently perform much better in practice than worst-case analysis suggests. Smoothed analysis has proved an effective approach to understanding this: a substantial literature shows that when a small amount of random noise is added to input data, local search algorithms typically run in polynomial or quasi-polynomial time. In this paper, we provide the first example where a local search algorithm for the Max-Cut problem fails to be efficient in the framework of smoothed analysis. Specifically, we construct a graph with $n$ vertices where the smoothed runtime of the 3-FLIP algorithm can be as large as $2^{\Omega(\sqrt{n})}$.
Additionally, for the setting without random noise, we give a new construction of graphs where the runtime of the FLIP algorithm is $2^{\Omega(n)}$ for any pivot rule. These graphs are much smaller and have a simpler structure than previous constructions. - [14] arXiv:2402.18469 (replaced) [pdf, html, other]
-
Title: Interval-Constrained Bipartite Matching over TimeSubjects: Data Structures and Algorithms (cs.DS)
Interval-constrained online bipartite matching problem frequently occurs in medical appointment scheduling: unit-time jobs representing patients arrive online and are assigned to a time slot within their given time interval. We consider a variant of this problem where reassignments are allowed and extend it by a notion of current time, which is decoupled from the job arrival events. As jobs appear, the current point in time gradually advances. Jobs that are assigned to the current time unit become processed, which fixes part of the matching and disables these jobs or slots for reassignments in future steps. We refer to these time-dependent restrictions on reassignments as the over-time property.
We show that FirstFit with reassignments according to the shortest augmenting path rule is $\frac{2}{3}$-competitive with respect to the matching cardinality, and that the bound is tight. Interestingly, this bound holds even if the number of reassignments per job is bound by a constant. For the number of reassignments performed by the algorithm, we show that it is in $\Omega(n \log n)$ in the worst case, where $n$ is the number of patients or jobs on the online side. This result is in line with lower bounds for the number of reassignments in online bipartite matching with reassignments, and, similarly to this previous work, we also conjecture that this bound should be tight. Moreover, we show that the algorithm maintaining the earliest-deadline-first order in the schedule yields maximum matchings, but at a cost of $\Omega(n^2)$ reassignments in worst case.
Finally, we consider the generalization of the problem in which the set of feasible slots has arbitrary shape. We show that FirstFit retains its competitivity in this case and that no other deterministic algorithm can be more than $\frac{2}{3}$-competitive. - [15] arXiv:2407.10526 (replaced) [pdf, html, other]
-
Title: 9/7-Approximation for Two-Edge-Connectivity and Two-Vertex-ConnectivityComments: 16 pages. The result is extended to 2-VCSS. Definitions and analysis changed accordinglySubjects: Data Structures and Algorithms (cs.DS)
We provide algorithms for the minimum 2-edge-connected spanning subgraph problem and the minimum 2-vertex-connected spanning subgraph problem with approximation ratio $\frac{9}{7}$. This improves upon a recent algorithm with ratio slightly smaller than $\frac{4}{3}$ for 2-edge-connectivity, and another one with ratio $\frac{4}{3}$ for 2-vertex-connectivity.
- [16] arXiv:2009.10677 (replaced) [pdf, html, other]
-
Title: On the Mysteries of MAX NAE-SATComments: 44 pages, 8 figures, accepted to SIDMASubjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS)
MAX NAE-SAT is a natural optimization problem, closely related to its better-known relative MAX SAT. The approximability status of MAX NAE-SAT is almost completely understood if all clauses have the same size $k$, for some $k\ge 2$. We refer to this problem as MAX NAE-$\{k\}$-SAT. For $k=2$, it is essentially the celebrated MAX CUT problem. For $k=3$, it is related to the MAX CUT problem in graphs that can be fractionally covered by triangles. For $k\ge 4$, it is known that an approximation ratio of $1-\frac{1}{2^{k-1}}$, obtained by choosing a random assignment, is optimal, assuming $P\ne NP$. For every $k\ge 2$, an approximation ratio of at least $\frac{7}{8}$ can be obtained for MAX NAE-$\{k\}$-SAT. There was some hope, therefore, that there is also a $\frac{7}{8}$-approximation algorithm for MAX NAE-SAT, where clauses of all sizes are allowed simultaneously.
Our main result is that there is no $\frac{7}{8}$-approximation algorithm for MAX NAE-SAT, assuming the unique games conjecture (UGC). In fact, even for almost satisfiable instances of MAX NAE-$\{3,5\}$-SAT (i.e., MAX NAE-SAT where all clauses have size $3$ or $5$), the best approximation ratio that can be achieved, assuming UGC, is at most $\frac{3(\sqrt{21}-4)}{2}\approx 0.8739$. Using calculus of variations, we extend the analysis of O'Donnell and Wu for MAX CUT to MAX NAE-$\{3\}$-SAT. We obtain an optimal algorithm, assuming UGC, for MAX NAE-$\{3\}$-SAT, slightly improving on previous algorithms. The approximation ratio of the new algorithm is $\approx 0.9089$.
We complement our theoretical results with some experimental results. We describe an approximation algorithm for almost satisfiable instances of MAX NAE-$\{3,5\}$-SAT with a conjectured approximation ratio of 0.8728, and an approximation algorithm for almost satisfiable instances of MAX NAE-SAT with a conjectured approximation ratio of 0.8698. - [17] arXiv:2212.07124 (replaced) [pdf, html, other]
-
Title: Data Structures for Approximate Discrete Fr\'echet DistanceComments: To appear in ISAAC 2024Subjects: Computational Geometry (cs.CG); Data Structures and Algorithms (cs.DS)
The Fréchet distance is a popular distance measure between curves $P$ and $Q$. Conditional lower bounds prohibit $(1 + \varepsilon)$-approximate Fréchet distance computations in strongly subquadratic time, even when preprocessing $P$ using any polynomial amount of time and space. As a consequence, the Fréchet distance has been studied under realistic input assumptions, for example, assuming both curves are $c$-packed.
In this paper, we study $c$-packed curves in Euclidean space $\mathbb R^d$ and in general geodesic metrics $\mathcal X$. In $\mathbb R^d$, we provide a nearly-linear time static algorithm for computing the $(1+\varepsilon)$-approximate continuous Fréchet distance between $c$-packed curves. Our algorithm has a linear dependence on the dimension $d$, as opposed to previous algorithms which have an exponential dependence on $d$.
In general geodesic metric spaces $\mathcal X$, little was previously known. We provide the first data structure, and thereby the first algorithm, under this model. Given a $c$-packed input curve $P$ with $n$ vertices, we preprocess it in $O(n \log n)$ time, so that given a query containing a constant $\varepsilon$ and a curve $Q$ with $m$ vertices, we can return a $(1+\varepsilon)$-approximation of the discrete Fréchet distance between $P$ and $Q$ in time polylogarithmic in $n$ and linear in $m$, $1/\varepsilon$, and the realism parameter $c$. Finally, we show several extensions to our data structure; to support dynamic extend/truncate updates on $P$, to answer map matching queries, and to answer Hausdorff distance queries. - [18] arXiv:2307.07727 (replaced) [pdf, html, other]
-
Title: Optimal Mixing via Tensorization for Random Independent Sets on Arbitrary TreesComments: The optimum mixing result (Theorem 1.2) of version 1 of the manuscript has been removed due to an errorSubjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
We study the mixing time of the single-site update Markov chain, known as the Glauber dynamics, for generating a random independent set of a tree. Our focus is obtaining optimal convergence results for arbitrary trees. We consider the more general problem of sampling from the Gibbs distribution in the hard-core model where independent sets are weighted by a parameter $\lambda>0$; the special case $\lambda=1$ corresponds to the uniform distribution over all independent sets. Previous work of Martinelli, Sinclair and Weitz (2004) obtained optimal mixing time bounds for the complete $\Delta$-regular tree for all $\lambda$. However, Restrepo et al. (2014) showed that for sufficiently large $\lambda$ there are bounded-degree trees where optimal mixing does not hold. Recent work of Eppstein and Frishberg (2022) proved a polynomial mixing time bound for the Glauber dynamics for arbitrary trees, and more generally for graphs of bounded tree-width.
We establish an optimal bound on the relaxation time (i.e., inverse spectral gap) of $O(n)$ for the Glauber dynamics for unweighted independent sets on arbitrary trees. We stress that our results hold for arbitrary trees and there is no dependence on the maximum degree $\Delta$. Interestingly, our results extend (far) beyond the uniqueness threshold which is on the order $\lambda=O(1/\Delta)$. Our proof approach is inspired by recent work on spectral independence. In fact, we prove that spectral independence holds with a constant independent of the maximum degree for any tree, but this does not imply mixing for general trees as the optimal mixing results of Chen, Liu, and Vigoda (2021) only apply for bounded degree graphs. We instead utilize the combinatorial nature of independent sets to directly prove approximate tensorization of variance via a non-trivial inductive proof. - [19] arXiv:2404.17080 (replaced) [pdf, html, other]
-
Title: Solving the Graph Burning Problem for Large GraphsComments: 10 pages, 1 figure and 2 tablesJournal-ref: A Row Generation Algorithm for Finding Optimal Burning Sequences of Large Graphs. In 32nd Annual European Symposium on Algorithms (ESA 2024). Leibniz International Proceedings in Informatics (LIPIcs), Volume 308, pp. 94:1-94:17, 2024Subjects: Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
We propose an exact algorithm for the Graph Burning Problem ($\texttt{GBP}$), an NP-hard optimization problem that models the spread of influence on social networks. Given a graph $G$ with vertex set $V$, the objective is to find a sequence of $k$ vertices in $V$, namely, $v_1, v_2, \dots, v_k$, such that $k$ is minimum and $\bigcup_{i = 1}^{k} \{u\! \in\! V\! : d(u, v_i) \leq k - i\} = V$, where $d(u,v)$ denotes the distance between $u$ and $v$. We formulate the problem as a set covering integer programming model and design a row generation algorithm for the $\texttt{GBP}$. Our method exploits the fact that a very small number of covering constraints is often sufficient for solving the integer model, allowing the corresponding rows to be generated on demand. To date, the most efficient exact algorithm for the $\texttt{GBP}$, denoted here by $\texttt{GDCA}$, is able to obtain optimal solutions for graphs with up to 14,000 vertices within two hours of execution. In comparison, our algorithm finds provably optimal solutions approximately 236 times faster, on average, than $\texttt{GDCA}$. For larger graphs, memory space becomes a limiting factor for $\texttt{GDCA}$. Our algorithm, however, solves real-world instances with almost 200,000 vertices in less than 35 seconds, increasing the size of graphs for which optimal solutions are known by a factor of 14.
- [20] arXiv:2409.13096 (replaced) [pdf, html, other]
-
Title: Fast decision tree learning solves hard coding-theoretic problemsComments: 31 pages, FOCS 2024Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
We connect the problem of properly PAC learning decision trees to the parameterized Nearest Codeword Problem ($k$-NCP). Despite significant effort by the respective communities, algorithmic progress on both problems has been stuck: the fastest known algorithm for the former runs in quasipolynomial time (Ehrenfeucht and Haussler 1989) and the best known approximation ratio for the latter is $O(n/\log n)$ (Berman and Karpinsky 2002; Alon, Panigrahy, and Yekhanin 2009). Research on both problems has thus far proceeded independently with no known connections.
We show that $\textit{any}$ improvement of Ehrenfeucht and Haussler's algorithm will yield $O(\log n)$-approximation algorithms for $k$-NCP, an exponential improvement of the current state of the art. This can be interpreted either as a new avenue for designing algorithms for $k$-NCP, or as one for establishing the optimality of Ehrenfeucht and Haussler's algorithm. Furthermore, our reduction along with existing inapproximability results for $k$-NCP already rule out polynomial-time algorithms for properly learning decision trees. A notable aspect of our hardness results is that they hold even in the setting of $\textit{weak}$ learning whereas prior ones were limited to the setting of strong learning.