Search | arXiv e-print repository

GIST: Greedy Independent Set Thresholding for Diverse Data Summarization

Authors: Matthew Fahrbach, Srikumar Ramalingam, Morteza Zadimoghaddam, Sara Ahmadian, Gui Citovsky, Giulia DeSalvo

Abstract: We propose a novel subset selection task called min-distance diverse data summarization ($\textsf{MDDS}$), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal is to maximize an objective that combines the total utility of the points and a diversity term that captures the minimum distance between… ▽ More We propose a novel subset selection task called min-distance diverse data summarization ($\textsf{MDDS}$), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal is to maximize an objective that combines the total utility of the points and a diversity term that captures the minimum distance between any pair of selected points, subject to the constraint $|S| \le k$. For example, the points may correspond to training examples in a data sampling problem, e.g., learned embeddings of images extracted from a deep neural network. This work presents the $\texttt{GIST}$ algorithm, which achieves a $\frac{2}{3}$-approximation guarantee for $\textsf{MDDS}$ by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove a complementary $(\frac{2}{3}+\varepsilon)$-hardness of approximation, for any $\varepsilon > 0$. Finally, we provide an empirical study that demonstrates $\texttt{GIST}$ outperforms existing methods for $\textsf{MDDS}$ on synthetic data, and also for a real-world image classification experiment the studies single-shot subset selection for ImageNet. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 15 pages, 1 figure

arXiv:2402.04987 [pdf, other]

PriorBoost: An Adaptive Algorithm for Learning from Aggregate Responses

Authors: Adel Javanmard, Matthew Fahrbach, Vahab Mirrokni

Abstract: This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called bags in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of… ▽ More This work studies algorithms for learning from aggregate responses. We focus on the construction of aggregation sets (called bags in the literature) for event-level loss functions. We prove for linear regression and generalized linear models (GLMs) that the optimal bagging problem reduces to one-dimensional size-constrained $k$-means clustering. Further, we theoretically quantify the advantage of using curated bags over random bags. We then propose the PriorBoost algorithm, which adaptively forms bags of samples that are increasingly homogeneous with respect to (unobserved) individual responses to improve model quality. We study label differential privacy for aggregate learning, and we also provide extensive experiments showing that PriorBoost regularly achieves optimal model quality for event-level predictions, in stark contrast to non-adaptive algorithms. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 29 pages, 4 figures

arXiv:2311.06192 [pdf, other]

Greedy PIG: Adaptive Integrated Gradients

Authors: Kyriakos Axiotis, Sami Abu-al-haija, Lin Chen, Matthew Fahrbach, Gang Fu

Abstract: Deep learning has become the standard approach for most machine learning tasks. While its impact is undeniable, interpreting the predictions of deep learning models from a human perspective remains a challenge. In contrast to model training, model interpretability is harder to quantify and pose as an explicit optimization problem. Inspired by the AUC softmax information curve (AUC SIC) metric for… ▽ More Deep learning has become the standard approach for most machine learning tasks. While its impact is undeniable, interpreting the predictions of deep learning models from a human perspective remains a challenge. In contrast to model training, model interpretability is harder to quantify and pose as an explicit optimization problem. Inspired by the AUC softmax information curve (AUC SIC) metric for evaluating feature attribution methods, we propose a unified discrete optimization framework for feature attribution and feature selection based on subset selection. This leads to a natural adaptive generalization of the path integrated gradients (PIG) method for feature attribution, which we call Greedy PIG. We demonstrate the success of Greedy PIG on a wide variety of tasks, including image feature attribution, graph compression/explanation, and post-hoc feature selection on tabular data. Our results show that introducing adaptivity is a powerful and versatile method for making attribution methods more powerful. △ Less

Submitted 10 November, 2023; originally announced November 2023.

arXiv:2311.03703 [pdf, other]

Practical Performance Guarantees for Pipelined DNN Inference

Authors: Aaron Archer, Matthew Fahrbach, Kuikui Liu, Prakash Prabhu

Abstract: We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We give practical and effective algorithms for this NP-hard problem, but our emphasis is on tackling the practitioner's dilemma of deciding when a solution is good enough. To this end, we design novel… ▽ More We optimize pipeline parallelism for deep neural network (DNN) inference by partitioning model graphs into $k$ stages and minimizing the running time of the bottleneck stage, including communication. We give practical and effective algorithms for this NP-hard problem, but our emphasis is on tackling the practitioner's dilemma of deciding when a solution is good enough. To this end, we design novel mixed-integer programming (MIP) relaxations for proving lower bounds. Applying these methods to a diverse testbed of 369 production models, for $k \in \{2, 4, 8, 16, 32, 64\}$, we empirically show that these lower bounds are strong enough to be useful in practice. Our lower bounds are substantially stronger than standard combinatorial bounds. For example, evaluated via geometric means across a production testbed with $k = 16$ pipeline stages, our MIP formulations raise the lower bound from 0.4598 to 0.9452, expressed as a fraction of the best partition found. In other words, our improved lower bounds close the optimality gap by a factor of 9.855x. △ Less

Submitted 4 June, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

Comments: 17 pages, 5 figures

arXiv:2305.12102 [pdf, other]

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

Authors: Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng

Abstract: Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely h… ▽ More Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely high-cardinality features. This bottleneck has led to substantial progress in alternative embedding algorithms. Many of these methods, however, make the assumption that each feature uses an independent embedding table. This work introduces a simple yet highly effective framework, Feature Multiplexing, where one single representation space is used across many different categorical features. Our theoretical and empirical analysis reveals that multiplexed embeddings can be decomposed into components from each constituent feature, allowing models to distinguish between features. We show that multiplexed representations lead to Pareto-optimal parameter-accuracy tradeoffs for three public benchmark datasets. Further, we propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. Unified embedding gives significant improvements in offline and online metrics compared to highly competitive baselines across five web-scale search, ads, and recommender systems, where it serves billions of users across the world in industry-leading products. △ Less

Submitted 14 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

Comments: NeurIPS'23 Spotlight

Journal ref: Proceedings of the 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023) 56234-56255

arXiv:2303.15634 [pdf, other]

Learning Rate Schedules in the Presence of Distribution Shift

Authors: Matthew Fahrbach, Adel Javanmard, Vahab Mirrokni, Pratik Worah

Abstract: We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift and we… ▽ More We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift and we give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically increase in the presence of distribution shift. Finally, we provide experiments for high-dimensional regression models and neural networks to illustrate these learning rate schedules and their cumulative regret. △ Less

Submitted 20 August, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

Comments: 33 pages, 6 figures

Journal ref: Proceedings of the 40th International Conference on Machine Learning (ICML 2023) 9523-9546

arXiv:2302.03886 [pdf, other]

Approximately Optimal Core Shapes for Tensor Decompositions

Authors: Mehrdad Ghadiri, Matthew Fahrbach, Gang Fu, Vahab Mirrokni

Abstract: This work studies the combinatorial optimization problem of finding an optimal core tensor shape, also called multilinear rank, for a size-constrained Tucker decomposition. We give an algorithm with provable approximation guarantees for its reconstruction error via connections to higher-order singular values. Specifically, we introduce a novel Tucker packing problem, which we prove is NP-hard, and… ▽ More This work studies the combinatorial optimization problem of finding an optimal core tensor shape, also called multilinear rank, for a size-constrained Tucker decomposition. We give an algorithm with provable approximation guarantees for its reconstruction error via connections to higher-order singular values. Specifically, we introduce a novel Tucker packing problem, which we prove is NP-hard, and give a polynomial-time approximation scheme based on a reduction to the 2-dimensional knapsack problem with a matroid constraint. We also generalize our techniques to tree tensor network decompositions. We implement our algorithm using an integer programming solver, and show that its solution quality is competitive with (and sometimes better than) the greedy algorithm that uses the true Tucker decomposition loss at each step, while also running up to 1000x faster. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Comments: 18 pages, 4 figures

Journal ref: Proceedings of the 40th International Conference on Machine Learning (ICML 2023) 11237-11254

arXiv:2209.14881 [pdf, other]

Sequential Attention for Feature Selection

Authors: Taisuke Yasuda, MohammadHossein Bateni, Lin Chen, Matthew Fahrbach, Gang Fu, Vahab Mirrokni

Abstract: Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques, typically select the entire feature subset in one evaluation round, ignoring the residual value of features during selection… ▽ More Feature selection is the problem of selecting a subset of features for a machine learning model that maximizes model quality subject to a budget constraint. For neural networks, prior methods, including those based on $\ell_1$ regularization, attention, and other techniques, typically select the entire feature subset in one evaluation round, ignoring the residual value of features during selection, i.e., the marginal contribution of a feature given that other features have already been selected. We propose a feature selection algorithm called Sequential Attention that achieves state-of-the-art empirical results for neural networks. This algorithm is based on an efficient one-pass implementation of greedy forward selection and uses attention weights at each step as a proxy for feature importance. We give theoretical insights into our algorithm for linear regression by showing that an adaptation to this setting is equivalent to the classical Orthogonal Matching Pursuit (OMP) algorithm, and thus inherits all of its provable guarantees. Our theoretical and empirical analyses offer new explanations towards the effectiveness of attention and its connections to overparameterization, which may be of independent interest. △ Less

Submitted 25 April, 2023; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: Accepted to ICLR 2023

Journal ref: Proceedings of the 11th International Conference on Learning Representations (ICLR 2023)

arXiv:2209.04876 [pdf, other]

Subquadratic Kronecker Regression with Applications to Tensor Decomposition

Authors: Matthew Fahrbach, Thomas Fu, Mehrdad Ghadiri

Abstract: Kronecker regression is a highly-structured least squares problem $\min_{\mathbf{x}} \lVert \mathbf{K}\mathbf{x} - \mathbf{b} \rVert_{2}^2$, where the design matrix $\mathbf{K} = \mathbf{A}^{(1)} \otimes \cdots \otimes \mathbf{A}^{(N)}$ is a Kronecker product of factor matrices. This regression problem arises in each step of the widely-used alternating least squares (ALS) algorithm for computing t… ▽ More Kronecker regression is a highly-structured least squares problem $\min_{\mathbf{x}} \lVert \mathbf{K}\mathbf{x} - \mathbf{b} \rVert_{2}^2$, where the design matrix $\mathbf{K} = \mathbf{A}^{(1)} \otimes \cdots \otimes \mathbf{A}^{(N)}$ is a Kronecker product of factor matrices. This regression problem arises in each step of the widely-used alternating least squares (ALS) algorithm for computing the Tucker decomposition of a tensor. We present the first subquadratic-time algorithm for solving Kronecker regression to a $(1+\varepsilon)$-approximation that avoids the exponential term $O(\varepsilon^{-N})$ in the running time. Our techniques combine leverage score sampling and iterative methods. By extending our approach to block-design matrices where one block is a Kronecker product, we also achieve subquadratic-time algorithms for (1) Kronecker ridge regression and (2) updating the factor matrices of a Tucker decomposition in ALS, which is not a pure Kronecker regression problem, thereby improving the running time of all steps of Tucker ALS. We demonstrate the speed and accuracy of this Kronecker regression algorithm on synthetic data and real-world image tensors. △ Less

Submitted 12 May, 2023; v1 submitted 11 September, 2022; originally announced September 2022.

Comments: 36 pages, 1 figure, 12 tables. arXiv admin note: text overlap with arXiv:2107.10654

MSC Class: 62J05; 62J07; 65F10 ACM Class: F.2.1; G.1.3; G.1.6

Journal ref: Advances in Neural Information Processing Systems 35 (2022): 28776-28789

arXiv:2107.10654 [pdf, other]

Fast Low-Rank Tensor Decomposition by Ridge Leverage Score Sampling

Authors: Matthew Fahrbach, Mehrdad Ghadiri, Thomas Fu

Abstract: Low-rank tensor decomposition generalizes low-rank matrix approximation and is a powerful technique for discovering low-dimensional structure in high-dimensional data. In this paper, we study Tucker decompositions and use tools from randomized numerical linear algebra called ridge leverage scores to accelerate the core tensor update step in the widely-used alternating least squares (ALS) algorithm… ▽ More Low-rank tensor decomposition generalizes low-rank matrix approximation and is a powerful technique for discovering low-dimensional structure in high-dimensional data. In this paper, we study Tucker decompositions and use tools from randomized numerical linear algebra called ridge leverage scores to accelerate the core tensor update step in the widely-used alternating least squares (ALS) algorithm. Updating the core tensor, a severe bottleneck in ALS, is a highly-structured ridge regression problem where the design matrix is a Kronecker product of the factor matrices. We show how to use approximate ridge leverage scores to construct a sketched instance for any ridge regression problem such that the solution vector for the sketched problem is a $(1+\varepsilon)$-approximation to the original instance. Moreover, we show that classical leverage scores suffice as an approximation, which then allows us to exploit the Kronecker structure and update the core tensor in time that depends predominantly on the rank and the sketching parameters (i.e., sublinear in the size of the input tensor). We also give upper bounds for ridge leverage scores as rows are removed from the design matrix (e.g., if the tensor has missing entries), and we demonstrate the effectiveness of our approximate ridge regressioni algorithm for large, low-rank Tucker decompositions on both synthetic and real-world data. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 29 pages, 1 figure

arXiv:2007.02817 [pdf, other]

Faster Graph Embeddings via Coarsening

Authors: Matthew Fahrbach, Gramoz Goranci, Richard Peng, Sushant Sachdeva, Chi Wang

Abstract: Graph embeddings are a ubiquitous tool for machine learning tasks, such as node classification and link prediction, on graph-structured data. However, computing the embeddings for large-scale graphs is prohibitively inefficient even if we are interested only in a small subset of relevant vertices. To address this, we present an efficient graph coarsening approach, based on Schur complements, for c… ▽ More Graph embeddings are a ubiquitous tool for machine learning tasks, such as node classification and link prediction, on graph-structured data. However, computing the embeddings for large-scale graphs is prohibitively inefficient even if we are interested only in a small subset of relevant vertices. To address this, we present an efficient graph coarsening approach, based on Schur complements, for computing the embedding of the relevant vertices. We prove that these embeddings are preserved exactly by the Schur complement graph that is obtained via Gaussian elimination on the non-relevant vertices. As computing Schur complements is expensive, we give a nearly-linear time algorithm that generates a coarsened graph on the relevant vertices that provably matches the Schur complement in expectation in each iteration. Our experiments involving prediction tasks on graphs demonstrate that computing embeddings on the coarsened graph, rather than the entire graph, leads to significant time savings without sacrificing accuracy. △ Less

Submitted 22 October, 2020; v1 submitted 6 July, 2020; originally announced July 2020.

Comments: 18 pages, 2 figures, to appear in the Proceedings of the 37th International Conference on Machine Learning (ICML 2020)

Journal ref: Proceedings of the 37th International Conference on Machine Learning (ICML 2020) 2953-2963

arXiv:2005.01929 [pdf, other]

doi 10.1145/3556971

Edge-Weighted Online Bipartite Matching

Authors: Matthew Fahrbach, Zhiyi Huang, Runzhou Tao, Morteza Zadimoghaddam

Abstract: Online bipartite matching and its variants are among the most fundamental problems in the online algorithms literature. Karp, Vazirani, and Vazirani (STOC 1990) introduced an elegant algorithm for the unweighted problem that achieves an optimal competitive ratio of $1-1/e$. Later, Aggarwal et al. (SODA 2011) generalized their algorithm and analysis to the vertex-weighted case. Little is known, how… ▽ More Online bipartite matching and its variants are among the most fundamental problems in the online algorithms literature. Karp, Vazirani, and Vazirani (STOC 1990) introduced an elegant algorithm for the unweighted problem that achieves an optimal competitive ratio of $1-1/e$. Later, Aggarwal et al. (SODA 2011) generalized their algorithm and analysis to the vertex-weighted case. Little is known, however, about the most general edge-weighted problem aside from the trivial $1/2$-competitive greedy algorithm. In this paper, we present the first online algorithm that breaks the long-standing $1/2$ barrier and achieves a competitive ratio of at least $0.5086$. In light of the hardness result of Kapralov, Post, and Vondrák (SODA 2013) that restricts beating a $1/2$ competitive ratio for the more general problem of monotone submodular welfare maximization, our result can be seen as strong evidence that edge-weighted bipartite matching is strictly easier than submodular welfare maximization in the online setting. The main ingredient in our online matching algorithm is a novel subroutine called online correlated selection (OCS), which takes a sequence of pairs of vertices as input and selects one vertex from each pair. Instead of using a fresh random bit to choose a vertex from each pair, the OCS negatively correlates decisions across different pairs and provides a quantitative measure on the level of correlation. We believe our OCS technique is of independent interest and will find further applications in other online optimization problems. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: 36 pages, 5 figures. This work merges and refines the results in arXiv:1704.05384, arXiv:1910.02569, and arXiv:1910.03287. In particular, we fix a bug in arXiv:1910.03287 and have a smaller competitive ratio as a result

Journal ref: Journal of the ACM 69(6): 45:1-45:35 (2022)

arXiv:1907.12119 [pdf, other]

doi 10.1137/1.9781611976465.45

A Fast Minimum Degree Algorithm and Matching Lower Bound

Authors: Robert Cummings, Matthew Fahrbach, Animesh Fatehpuria

Abstract: The minimum degree algorithm is one of the most widely-used heuristics for reducing the cost of solving large sparse systems of linear equations. It has been studied for nearly half a century and has a rich history of bridging techniques from data structures, graph algorithms, and scientific computing. In this paper, we present a simple but novel combinatorial algorithm for computing an exact mini… ▽ More The minimum degree algorithm is one of the most widely-used heuristics for reducing the cost of solving large sparse systems of linear equations. It has been studied for nearly half a century and has a rich history of bridging techniques from data structures, graph algorithms, and scientific computing. In this paper, we present a simple but novel combinatorial algorithm for computing an exact minimum degree elimination ordering in $O(nm)$ time, which improves on the best known time complexity of $O(n^3)$ and offers practical improvements for sparse systems with small values of $m$. Our approach leverages a careful amortized analysis, which also allows us to derive output-sensitive bounds for the running time of $O(\min\{m\sqrt{m^+}, Δm^+\} \log n)$, where $m^+$ is the number of unique fill edges and original edges that the algorithm encounters and $Δ$ is the maximum degree of the input graph. Furthermore, we show there cannot exist an exact minimum degree algorithm that runs in $O(nm^{1-\varepsilon})$ time, for any $\varepsilon > 0$, assuming the strong exponential time hypothesis. This fine-grained reduction goes through the orthogonal vectors problem and uses a new low-degree graph construction called $U$-fillers, which act as pathological inputs and cause any minimum degree algorithm to exhibit nearly worst-case performance. With these two results, we nearly characterize the time complexity of computing an exact minimum degree ordering. △ Less

Submitted 22 July, 2020; v1 submitted 28 July, 2019; originally announced July 2019.

Comments: 17 pages

Journal ref: Proceedings of the 32nd Annual ACM-SIAM Symposium on Discrete Algorithms (2021) 724-734

arXiv:1904.01495 [pdf, other]

Slow Mixing of Glauber Dynamics for the Six-Vertex Model in the Ordered Phases

Authors: Matthew Fahrbach, Dana Randall

Abstract: The six-vertex model in statistical physics is a weighted generalization of the ice model on $\mathbb{Z}^2$ (i.e., Eulerian orientations) and the zero-temperature three-state Potts model (i.e., proper three-colorings). The phase diagram of the model depicts its physical properties and suggests where local Markov chains will be efficient. In this paper, we analyze the mixing time of Glauber dynamic… ▽ More The six-vertex model in statistical physics is a weighted generalization of the ice model on $\mathbb{Z}^2$ (i.e., Eulerian orientations) and the zero-temperature three-state Potts model (i.e., proper three-colorings). The phase diagram of the model depicts its physical properties and suggests where local Markov chains will be efficient. In this paper, we analyze the mixing time of Glauber dynamics for the six-vertex model in the ordered phases. Specifically, we show that for all Boltzmann weights in the ferroelectric phase, there exist boundary conditions such that local Markov chains require exponential time to converge to equilibrium. This is the first rigorous result bounding the mixing time of Glauber dynamics in the ferroelectric phase. Our analysis demonstrates a fundamental connection between correlated random walks and the dynamics of intersecting lattice path models (or routings). We analyze the Glauber dynamics for the six-vertex model with free boundary conditions in the antiferroelectric phase and significantly extend the region for which local Markov chains are known to be slow mixing. This result relies on a Peierls argument and novel properties of weighted non-backtracking walks. △ Less

Submitted 22 December, 2020; v1 submitted 2 April, 2019; originally announced April 2019.

Comments: 28 pages, 6 figures, Proceedings of the 23rd International Conference on Randomization and Computation (RANDOM 2019)

arXiv:1808.06932 [pdf, other]

Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity

Authors: Matthew Fahrbach, Vahab Mirrokni, Morteza Zadimoghaddam

Abstract: Submodular maximization is a general optimization problem with a wide range of applications in machine learning (e.g., active learning, clustering, and feature selection). In large-scale optimization, the parallel running time of an algorithm is governed by its adaptivity, which measures the number of sequential rounds needed if the algorithm can execute polynomially-many independent oracle querie… ▽ More Submodular maximization is a general optimization problem with a wide range of applications in machine learning (e.g., active learning, clustering, and feature selection). In large-scale optimization, the parallel running time of an algorithm is governed by its adaptivity, which measures the number of sequential rounds needed if the algorithm can execute polynomially-many independent oracle queries in parallel. While low adaptivity is ideal, it is not sufficient for an algorithm to be efficient in practice -- there are many applications of distributed submodular optimization where the number of function evaluations becomes prohibitively expensive. Motivated by these applications, we study the adaptivity and query complexity of submodular maximization. In this paper, we give the first constant-factor approximation algorithm for maximizing a non-monotone submodular function subject to a cardinality constraint $k$ that runs in $O(\log(n))$ adaptive rounds and makes $O(n \log(k))$ oracle queries in expectation. In our empirical study, we use three real-world applications to compare our algorithm with several benchmarks for non-monotone submodular maximization. The results demonstrate that our algorithm finds competitive solutions using significantly fewer rounds and queries. △ Less

Submitted 7 April, 2023; v1 submitted 19 August, 2018; originally announced August 2018.

Comments: 19 pages, 8 figures. This version fixes a bug in the threshold sampling algorithm that implicitly assumed monotonicity. All original results hold

Journal ref: Proceedings of the 36th International Conference on Machine Learning (ICML 2019) 1833-1842

arXiv:1807.07889 [pdf, ps, other]

doi 10.1137/1.9781611975482.17

Submodular Maximization with Nearly Optimal Approximation, Adaptivity and Query Complexity

Authors: Matthew Fahrbach, Vahab Mirrokni, Morteza Zadimoghaddam

Abstract: Submodular optimization generalizes many classic problems in combinatorial optimization and has recently found a wide range of applications in machine learning (e.g., feature engineering and active learning). For many large-scale optimization problems, we are often concerned with the adaptivity complexity of an algorithm, which quantifies the number of sequential rounds where polynomially-many ind… ▽ More Submodular optimization generalizes many classic problems in combinatorial optimization and has recently found a wide range of applications in machine learning (e.g., feature engineering and active learning). For many large-scale optimization problems, we are often concerned with the adaptivity complexity of an algorithm, which quantifies the number of sequential rounds where polynomially-many independent function evaluations can be executed in parallel. While low adaptivity is ideal, it is not sufficient for a distributed algorithm to be efficient, since in many practical applications of submodular optimization the number of function evaluations becomes prohibitively expensive. Motivated by these applications, we study the adaptivity and query complexity of adaptive submodular optimization. Our main result is a distributed algorithm for maximizing a monotone submodular function with cardinality constraint $k$ that achieves a $(1-1/e-\varepsilon)$-approximation in expectation. This algorithm runs in $O(\log(n))$ adaptive rounds and makes $O(n)$ calls to the function evaluation oracle in expectation. The approximation guarantee and query complexity are optimal, and the adaptivity is nearly optimal. Moreover, the number of queries is substantially less than in previous works. Last, we extend our results to the submodular cover problem to demonstrate the generality of our algorithm and techniques. △ Less

Submitted 7 April, 2023; v1 submitted 20 July, 2018; originally announced July 2018.

Comments: 30 pages. This version fixes minor bugs with the definition of $I_t$ and the termination condition of Algorithm 5. We also update all theorem statements to explicitly assume monotone submodular functions

Journal ref: Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2019) 255-273

arXiv:1804.04239 [pdf, ps, other]

doi 10.1109/FOCS.2018.00019

Graph Sketching Against Adaptive Adversaries Applied to the Minimum Degree Algorithm

Authors: Matthew Fahrbach, Gary L. Miller, Richard Peng, Saurabh Sawlani, Junxing Wang, Shen Chen Xu

Abstract: Motivated by the study of matrix elimination orderings in combinatorial scientific computing, we utilize graph sketching and local sampling to give a data structure that provides access to approximate fill degrees of a matrix undergoing elimination in $O(\text{polylog}(n))$ time per elimination and query. We then study the problem of using this data structure in the minimum degree algorithm, which… ▽ More Motivated by the study of matrix elimination orderings in combinatorial scientific computing, we utilize graph sketching and local sampling to give a data structure that provides access to approximate fill degrees of a matrix undergoing elimination in $O(\text{polylog}(n))$ time per elimination and query. We then study the problem of using this data structure in the minimum degree algorithm, which is a widely-used heuristic for producing elimination orderings for sparse matrices by repeatedly eliminating the vertex with (approximate) minimum fill degree. This leads to a nearly-linear time algorithm for generating approximate greedy minimum degree orderings. Despite extensive studies of algorithms for elimination orderings in combinatorial scientific computing, our result is the first rigorous incorporation of randomized tools in this setting, as well as the first nearly-linear time algorithm for producing elimination orderings with provable approximation guarantees. While our sketching data structure readily works in the oblivious adversary model, by repeatedly querying and greedily updating itself, it enters the adaptive adversarial model where the underlying sketches become prone to failure due to dependency issues with their internal randomness. We show how to use an additional sampling procedure to circumvent this problem and to create an independent access sequence. Our technique for decorrelating the interleaved queries and updates to this randomized data structure may be of independent interest. △ Less

Submitted 11 April, 2018; originally announced April 2018.

Comments: 58 pages, 3 figures. This is a substantially revised version of arXiv:1711.08446 with an emphasis on the underlying theoretical problems

Journal ref: Proceedings of the 59th Annual IEEE Symposium on Foundations of Computer Science (2018) 101-112

arXiv:1711.08446 [pdf, ps, other]

On Computing Min-Degree Elimination Orderings

Authors: Matthew Fahrbach, Gary L. Miller, Richard Peng, Saurabh Sawlani, Junxing Wang, Shen Chen Xu

Abstract: We study faster algorithms for producing the minimum degree ordering used to speed up Gaussian elimination. This ordering is based on viewing the non-zero elements of a symmetric positive definite matrix as edges of an undirected graph, and aims at reducing the additional non-zeros (fill) in the matrix by repeatedly removing the vertex of minimum degree. It is one of the most widely used primitive… ▽ More We study faster algorithms for producing the minimum degree ordering used to speed up Gaussian elimination. This ordering is based on viewing the non-zero elements of a symmetric positive definite matrix as edges of an undirected graph, and aims at reducing the additional non-zeros (fill) in the matrix by repeatedly removing the vertex of minimum degree. It is one of the most widely used primitives for pre-processing sparse matrices in scientific computing. Our result is in part motivated by the observation that sub-quadratic time algorithms for finding min-degree orderings are unlikely, assuming the strong exponential time hypothesis (SETH). This provides justification for the lack of provably efficient algorithms for generating such orderings, and leads us to study speedups via degree-restricted algorithms as well as approximations. Our two main results are: (1) an algorithm that produces a min-degree ordering whose maximum degree is bounded by $Δ$ in $O(m Δ\log^3{n})$ time, and (2) an algorithm that finds an $(1 + ε)$-approximate marginal min-degree ordering in $O(m \log^{5}n ε^{-2})$ time. Both of our algorithms rely on a host of randomization tools related to the $\ell_0$-estimator by [Cohen `97]. A key technical issue for the final nearly-linear time algorithm are the dependencies of the vertex removed on the randomness in the data structures. To address this, we provide a method for generating a pseudo-deterministic access sequence, which then allows the incorporation of data structures that only work under the oblivious adversary model. △ Less

Submitted 22 November, 2017; originally announced November 2017.

Comments: 57 pages

arXiv:1708.02266 [pdf, other]

doi 10.1137/1.9781611975062.10

Analyzing Boltzmann Samplers for Bose-Einstein Condensates with Dirichlet Generating Functions

Authors: Megan Bernstein, Matthew Fahrbach, Dana Randall

Abstract: Boltzmann sampling is commonly used to uniformly sample objects of a particular size from large combinatorial sets. For this technique to be effective, one needs to prove that (1) the sampling procedure is efficient and (2) objects of the desired size are generated with sufficiently high probability. We use this approach to give a provably efficient sampling algorithm for a class of weighted integ… ▽ More Boltzmann sampling is commonly used to uniformly sample objects of a particular size from large combinatorial sets. For this technique to be effective, one needs to prove that (1) the sampling procedure is efficient and (2) objects of the desired size are generated with sufficiently high probability. We use this approach to give a provably efficient sampling algorithm for a class of weighted integer partitions related to Bose-Einstein condensation from statistical physics. Our sampling algorithm is a probabilistic interpretation of the ordinary generating function for these objects, derived from the symbolic method of analytic combinatorics. Using the Khintchine-Meinardus probabilistic method to bound the rejection rate of our Boltzmann sampler through singularity analysis of Dirichlet generating functions, we offer an alternative approach to analyze Boltzmann samplers for objects with multiplicative structure. △ Less

Submitted 13 November, 2017; v1 submitted 7 August, 2017; originally announced August 2017.

Comments: 20 pages, 1 figure

Journal ref: Proceedings of the 15th Workshop on Analytic Algorithmics and Combinatorics (ANALCO 2018) 107-117

arXiv:1704.05384 [pdf, other]

Online Weighted Matching: Breaking the $\frac{1}{2}$ Barrier

Authors: Matthew Fahrbach, Morteza Zadimoghaddam

Abstract: Online matching and its variants are some of the most fundamental problems in the online algorithms literature. In this paper, we study the online weighted bipartite matching problem. Karp et al. (STOC 1990) gave an elegant algorithm in the unweighted case that achieves a tight competitive ratio of $1-1/e$. In the weighted case, however, we can easily show that no competitive ratio is obtainable w… ▽ More Online matching and its variants are some of the most fundamental problems in the online algorithms literature. In this paper, we study the online weighted bipartite matching problem. Karp et al. (STOC 1990) gave an elegant algorithm in the unweighted case that achieves a tight competitive ratio of $1-1/e$. In the weighted case, however, we can easily show that no competitive ratio is obtainable without the commonly accepted free disposal assumption. Under this assumption, it is not hard to prove that the greedy algorithm is $1/2$ competitive, and that this is tight for deterministic algorithms. We present the first randomized algorithm that breaks this long-standing $1/2$ barrier and achieves a competitive ratio of at least $0.501$. In light of the hardness result of Kapralov et al. (SODA 2013) that restricts beating a $1/2$ competitive ratio for the monotone submodular welfare maximization problem, our result can be seen as strong evidence that solving the weighted bipartite matching problem is strictly easier than submodular welfare maximization in the online setting. Our approach relies on a very controlled use of randomness, which allows our algorithm to safely make adaptive decisions based on its previous assignments. △ Less

Submitted 21 November, 2019; v1 submitted 18 April, 2017; originally announced April 2017.

Comments: 28 pages, 1 figure. This is substantially revised version that simplifies the presentation and fixes some minor problems

arXiv:1704.04830 [pdf, other]

Nearly Tight Bounds for Sandpile Transience on the Grid

Authors: David Durfee, Matthew Fahrbach, Yu Gao, Tao Xiao

Abstract: We use techniques from the theory of electrical networks to give nearly tight bounds for the transience class of the Abelian sandpile model on the two-dimensional grid up to polylogarithmic factors. The Abelian sandpile model is a discrete process on graphs that is intimately related to the phenomenon of self-organized criticality. In this process, vertices receive grains of sand, and once the num… ▽ More We use techniques from the theory of electrical networks to give nearly tight bounds for the transience class of the Abelian sandpile model on the two-dimensional grid up to polylogarithmic factors. The Abelian sandpile model is a discrete process on graphs that is intimately related to the phenomenon of self-organized criticality. In this process, vertices receive grains of sand, and once the number of grains exceeds their degree, they topple by sending grains to their neighbors. The transience class of a model is the maximum number of grains that can be added to the system before it necessarily reaches its steady-state behavior or, equivalently, a recurrent state. Through a more refined and global analysis of electrical potentials and random walks, we give an $O(n^4\log^4{n})$ upper bound and an $Ω(n^4)$ lower bound for the transience class of the $n \times n$ grid. Our methods naturally extend to $n^d$-sized $d$-dimensional grids to give $O(n^{3d - 2}\log^{d+2}{n})$ upper bounds and $Ω(n^{3d -2})$ lower bounds. △ Less

Submitted 14 November, 2017; v1 submitted 16 April, 2017; originally announced April 2017.

Comments: 36 pages, 4 figures

Journal ref: Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2018) 605-624

arXiv:1611.03385 [pdf, other]

doi 10.1137/1.9781611974782.119

Approximately Sampling Elements with Fixed Rank in Graded Posets

Authors: Prateek Bhakta, Ben Cousins, Matthew Fahrbach, Dana Randall

Abstract: Graded posets frequently arise throughout combinatorics, where it is natural to try to count the number of elements of a fixed rank. These counting problems are often $\#\textbf{P}$-complete, so we consider approximation algorithms for counting and uniform sampling. We show that for certain classes of posets, biased Markov chains that walk along edges of their Hasse diagrams allow us to approximat… ▽ More Graded posets frequently arise throughout combinatorics, where it is natural to try to count the number of elements of a fixed rank. These counting problems are often $\#\textbf{P}$-complete, so we consider approximation algorithms for counting and uniform sampling. We show that for certain classes of posets, biased Markov chains that walk along edges of their Hasse diagrams allow us to approximately generate samples with any fixed rank in expected polynomial time. Our arguments do not rely on the typical proofs of log-concavity, which are used to construct a stationary distribution with a specific mode in order to give a lower bound on the probability of outputting an element of the desired rank. Instead, we infer this directly from bounds on the mixing time of the chains through a method we call $\textit{balanced bias}$. A noteworthy application of our method is sampling restricted classes of integer partitions of $n$. We give the first provably efficient Markov chain algorithm to uniformly sample integer partitions of $n$ from general restricted classes. Several observations allow us to improve the efficiency of this chain to require $O(n^{1/2}\log(n))$ space, and for unrestricted integer partitions, expected $O(n^{9/4})$ time. Related applications include sampling permutations with a fixed number of inversions and lozenge tilings on the triangular lattice with a fixed average height. △ Less

Submitted 10 November, 2016; originally announced November 2016.

Comments: 23 pages, 12 figures

Journal ref: Proceedings of the 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2017) 1823-1838

Showing 1–22 of 22 results for author: Fahrbach, M