Search | arXiv e-print repository

Efficiently Computing Similarities to Private Datasets

Authors: Arturs Backurs, Zinan Lin, Sepideh Mahabadi, Sandeep Silwal, Jakub Tarnawski

Abstract: Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) da… ▽ More Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) data structure which approximates $\sum_{x \in X} f(x,y)$ for any query $y$. We consider the cases where $f$ is a kernel function, such as $f(x,y) = e^{-\|x-y\|_2^2/σ^2}$ (also known as DP kernel density estimation), or a distance function such as $f(x,y) = \|x-y\|_2$, among others. Our theoretical results improve upon prior work and give better privacy-utility trade-offs as well as faster query times for a wide range of kernels and distance functions. The unifying approach behind our results is leveraging `low-dimensional structures' present in the specific functions $f$ that we study, using tools such as provable dimensionality reduction, approximation theory, and one-dimensional decomposition of the functions. Our algorithms empirically exhibit improved query times and accuracy over prior state of the art. We also present an application to DP classification. Our experiments demonstrate that the simple methodology of classifying based on average similarity is orders of magnitude faster than prior DP-SGD based approaches for comparable accuracy. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: To appear at ICLR 2024

arXiv:2402.10806 [pdf, other]

Streaming Algorithms for Connectivity Augmentation

Authors: Ce Jin, Michael Kapralov, Sepideh Mahabadi, Ali Vakilian

Abstract: We study the $k$-connectivity augmentation problem ($k$-CAP) in the single-pass streaming model. Given a $(k-1)$-edge connected graph $G=(V,E)$ that is stored in memory, and a stream of weighted edges $L$ with weights in $\{0,1,\dots,W\}$, the goal is to choose a minimum weight subset $L'\subseteq L$ such that $G'=(V,E\cup L')$ is $k$-edge connected. We give a $(2+ε)$-approximation algorithm for t… ▽ More We study the $k$-connectivity augmentation problem ($k$-CAP) in the single-pass streaming model. Given a $(k-1)$-edge connected graph $G=(V,E)$ that is stored in memory, and a stream of weighted edges $L$ with weights in $\{0,1,\dots,W\}$, the goal is to choose a minimum weight subset $L'\subseteq L$ such that $G'=(V,E\cup L')$ is $k$-edge connected. We give a $(2+ε)$-approximation algorithm for this problem which requires to store $O(ε^{-1} n\log n)$ words. Moreover, we show our result is tight: Any algorithm with better than $2$-approximation for the problem requires $Ω(n^2)$ bits of space even when $k=2$. This establishes a gap between the optimal approximation factor one can obtain in the streaming vs the offline setting for $k$-CAP. We further consider a natural generalization to the fully streaming model where both $E$ and $L$ arrive in the stream in an arbitrary order. We show that this problem has a space lower bound that matches the best possible size of a spanner of the same approximation ratio. Following this, we give improved results for spanners on weighted graphs: We show a streaming algorithm that finds a $(2t-1+ε)$-approximate weighted spanner of size at most $O(ε^{-1} n^{1+1/t}\log n)$ for integer $t$, whereas the best prior streaming algorithm for spanner on weighted graphs had size depending on $\log W$. Using our spanner result, we provide an optimal $O(t)$-approximation for $k$-CAP in the fully streaming model with $O(nk + n^{1+1/t})$ words of space. Finally we apply our results to network design problems such as Steiner tree augmentation problem (STAP), $k$-edge connected spanning subgraph ($k$-ECSS), and the general Survivable Network Design problem (SNDP). In particular, we show a single-pass $O(t\log k)$-approximation for SNDP using $O(kn^{1+1/t})$ words of space, where $k$ is the maximum connectivity requirement. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2310.08122 [pdf, other]

Core-sets for Fair and Diverse Data Summarization

Authors: Sepideh Mahabadi, Stojan Trajanovski

Abstract: We study core-set construction algorithms for the task of Diversity Maximization under fairness/partition constraint. Given a set of points $P$ in a metric space partitioned into $m$ groups, and given $k_1,\ldots,k_m$, the goal of this problem is to pick $k_i$ points from each group $i$ such that the overall diversity of the $k=\sum_i k_i$ picked points is maximized. We consider two natural divers… ▽ More We study core-set construction algorithms for the task of Diversity Maximization under fairness/partition constraint. Given a set of points $P$ in a metric space partitioned into $m$ groups, and given $k_1,\ldots,k_m$, the goal of this problem is to pick $k_i$ points from each group $i$ such that the overall diversity of the $k=\sum_i k_i$ picked points is maximized. We consider two natural diversity measures: sum-of-pairwise distances and sum-of-nearest-neighbor distances, and show improved core-set construction algorithms with respect to these measures. More precisely, we show the first constant factor core-set w.r.t. sum-of-pairwise distances whose size is independent of the size of the dataset and the aspect ratio. Second, we show the first core-set w.r.t. the sum-of-nearest-neighbor distances. Finally, we run several experiments showing the effectiveness of our core-set approach. In particular, we apply constrained diversity maximization to summarize a set of timed messages that takes into account the messages' recency. Specifically, the summary should include more recent messages compared to older ones. This is a real task in one of the largest communication platforms, affecting the experience of hundreds of millions daily active users. By utilizing our core-set method for this task, we achieve a 100x speed-up while losing the diversity by only a few percent. Moreover, our approach allows us to improve the space usage of the algorithm in the streaming setting. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2310.00175 [pdf, other]

Tight Bounds for Volumetric Spanners and Applications

Authors: Aditya Bhaskara, Sepideh Mahabadi, Ali Vakilian

Abstract: Given a set of points of interest, a volumetric spanner is a subset of the points using which all the points can be expressed using "small" coefficients (measured in an appropriate norm). Formally, given a set of vectors $X = \{v_1, v_2, \dots, v_n\}$, the goal is to find $T \subseteq [n]$ such that every $v \in X$ can be expressed as $\sum_{i\in T} α_i v_i$, with $\|α\|$ being small. This notion,… ▽ More Given a set of points of interest, a volumetric spanner is a subset of the points using which all the points can be expressed using "small" coefficients (measured in an appropriate norm). Formally, given a set of vectors $X = \{v_1, v_2, \dots, v_n\}$, the goal is to find $T \subseteq [n]$ such that every $v \in X$ can be expressed as $\sum_{i\in T} α_i v_i$, with $\|α\|$ being small. This notion, which has also been referred to as a well-conditioned basis, has found several applications, including bandit linear optimization, determinant maximization, and matrix low rank approximation. In this paper, we give almost optimal bounds on the size of volumetric spanners for all $\ell_p$ norms, and show that they can be constructed using a simple local search procedure. We then show the applications of our result to other tasks and in particular the problem of finding coresets for the Minimum Volume Enclosing Ellipsoid (MVEE) problem. △ Less

Submitted 29 September, 2023; originally announced October 2023.

Comments: NeurIPS 2023

arXiv:2309.15286 [pdf, other]

Composable Coresets for Determinant Maximization: Greedy is Almost Optimal

Authors: Siddharth Gollapudi, Sepideh Mahabadi, Varun Sivashankar

Abstract: Given a set of $n$ vectors in $\mathbb{R}^d$, the goal of the \emph{determinant maximization} problem is to pick $k$ vectors with the maximum volume. Determinant maximization is the MAP-inference task for determinantal point processes (DPP) and has recently received considerable attention for modeling diversity. As most applications for the problem use large amounts of data, this problem has been… ▽ More Given a set of $n$ vectors in $\mathbb{R}^d$, the goal of the \emph{determinant maximization} problem is to pick $k$ vectors with the maximum volume. Determinant maximization is the MAP-inference task for determinantal point processes (DPP) and has recently received considerable attention for modeling diversity. As most applications for the problem use large amounts of data, this problem has been studied in the relevant \textit{composable coreset} setting. In particular, [Indyk-Mahabadi-OveisGharan-Rezaei--SODA'20, ICML'19] showed that one can get composable coresets with optimal approximation factor of $\tilde O(k)^k$ for the problem, and that a local search algorithm achieves an almost optimal approximation guarantee of $O(k)^{2k}$. In this work, we show that the widely-used Greedy algorithm also provides composable coresets with an almost optimal approximation factor of $O(k)^{3k}$, which improves over the previously known guarantee of $C^{k^2}$, and supports the prior experimental results showing the practicality of the greedy algorithm as a coreset. Our main result follows by showing a local optimality property for Greedy: swapping a single point from the greedy solution with a vector that was not picked by the greedy algorithm can increase the volume by a factor of at most $(1+\sqrt{k})$. This is tight up to the additive constant $1$. Finally, our experiments show that the local optimality of the greedy algorithm is even lower than the theoretical bound on real data sets. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: Accepted to NeurIPS 2023

ACM Class: F.2.0; G.1.2; G.1.6; G.2.2

arXiv:2307.04329 [pdf, ps, other]

Improved Diversity Maximization Algorithms for Matching and Pseudoforest

Authors: Sepideh Mahabadi, Shyam Narayanan

Abstract: In this work we consider the diversity maximization problem, where given a data set $X$ of $n$ elements, and a parameter $k$, the goal is to pick a subset of $X$ of size $k$ maximizing a certain diversity measure. [CH01] defined a variety of diversity measures based on pairwise distances between the points. A constant factor approximation algorithm was known for all those diversity measures except… ▽ More In this work we consider the diversity maximization problem, where given a data set $X$ of $n$ elements, and a parameter $k$, the goal is to pick a subset of $X$ of size $k$ maximizing a certain diversity measure. [CH01] defined a variety of diversity measures based on pairwise distances between the points. A constant factor approximation algorithm was known for all those diversity measures except ``remote-matching'', where only an $O(\log k)$ approximation was known. In this work we present an $O(1)$ approximation for this remaining notion. Further, we consider these notions from the perpective of composable coresets. [IMMM14] provided composable coresets with a constant factor approximation for all but ``remote-pseudoforest'' and ``remote-matching'', which again they only obtained a $O(\log k)$ approximation. Here we also close the gap up to constants and present a constant factor composable coreset algorithm for these two notions. For remote-matching, our coreset has size only $O(k)$, and for remote-pseudoforest, our coreset has size $O(k^{1+\varepsilon})$ for any $\varepsilon > 0$, for an $O(1/\varepsilon)$-approximate coreset. △ Less

Submitted 9 July, 2023; originally announced July 2023.

Comments: 27 pages, 1 table. Accepted to APPROX, 2023

arXiv:2306.06778 [pdf, other]

Approximation Algorithms for Fair Range Clustering

Authors: Sèdjro S. Hotegni, Sepideh Mahabadi, Ali Vakilian

Abstract: This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick $k$ centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of $n$ points in a metric space $(P,d)$ where each point belongs to one o… ▽ More This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick $k$ centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of $n$ points in a metric space $(P,d)$ where each point belongs to one of the $\ell$ different demographics (i.e., $P = P_1 \uplus P_2 \uplus \cdots \uplus P_\ell$) and a set of $\ell$ intervals $[α_1, β_1], \cdots, [α_\ell, β_\ell]$ on desired number of centers from each group, the goal is to pick a set of $k$ centers $C$ with minimum $\ell_p$-clustering cost (i.e., $(\sum_{v\in P} d(v,C)^p)^{1/p}$) such that for each group $i\in \ell$, $|C\cap P_i| \in [α_i, β_i]$. In particular, the fair range $\ell_p$-clustering captures fair range $k$-center, $k$-median and $k$-means as its special cases. In this work, we provide efficient constant factor approximation algorithms for fair range $\ell_p$-clustering for all values of $p\in [1,\infty)$. △ Less

Submitted 22 June, 2023; v1 submitted 11 June, 2023; originally announced June 2023.

Comments: ICML 2023

arXiv:2211.00289 [pdf, ps, other]

Composable Coresets for Constrained Determinant Maximization and Beyond

Authors: Sepideh Mahabadi, Thuy-Duong Vuong

Abstract: We study the task of determinant maximization under partition constraint, in the context of large data sets. Given a point set $V\subset \mathbb{R}^d$ that is partitioned into $s$ groups $V_1,..., V_s$, and integers $k_1,...,k_s$ where $k=\sum_i k_i$, the goal is to pick $k_i$ points from group $i$ such that the overall determinant of the picked $k$ points is maximized. Determinant Maximization an… ▽ More We study the task of determinant maximization under partition constraint, in the context of large data sets. Given a point set $V\subset \mathbb{R}^d$ that is partitioned into $s$ groups $V_1,..., V_s$, and integers $k_1,...,k_s$ where $k=\sum_i k_i$, the goal is to pick $k_i$ points from group $i$ such that the overall determinant of the picked $k$ points is maximized. Determinant Maximization and its constrained variants have gained a lot of interest for modeling diversityand have found applications in the context of fairness and data summarization. We study the design of composable coresets for the constrained determinant maximization problem. Composable coresets are small subsets of the data that (approximately) preserve optimal solutions to optimization tasks and enable efficient solutions in several other large data models including the distributed and the streaming settings. In this work, we consider two regimes. For the case of $k>d$, we show a peeling algorithm that gives us a composable coreset of size $kd$ with an approximation factor of $d^{O(d)}$. We complement our results by showing that this approximation factor is tight. For the case of $k\leq d$, we show that a simple modification of the previous algorithms results in an optimal coreset verified by our lower bounds. Our results apply to all strongly Rayleigh distribution and several other experimental design problems. In addition, we show coreset construction algorithms under the more general laminar matroid constraints. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2207.07822 [pdf, ps, other]

Adaptive Sketches for Robust Regression with Importance Sampling

Authors: Sepideh Mahabadi, David P. Woodruff, Samson Zhou

Abstract: We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale machine learning, it is well-known for possibly experiencing slow convergence rates due to the high variance from uniform sampling. On the other hand, importance… ▽ More We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale machine learning, it is well-known for possibly experiencing slow convergence rates due to the high variance from uniform sampling. On the other hand, importance sampling can significantly decrease the variance but is usually difficult to implement because computing the sampling probabilities requires additional passes over the data, in which case standard gradient descent (GD) could be used instead. In this paper, we introduce an algorithm that approximately samples $T$ gradients of dimension $d$ from nearly the optimal importance sampling distribution for a robust regression problem over $n$ rows. Thus our algorithm effectively runs $T$ steps of SGD with importance sampling while using sublinear space and just making a single pass over the data. Our techniques also extend to performing importance sampling for second-order optimization. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Comments: RANDOM 2022

arXiv:2101.10905 [pdf, other]

Sampling a Near Neighbor in High Dimensions -- Who is the Fairest of Them All?

Authors: Martin Aumüller, Sariel Har-Peled, Sepideh Mahabadi, Rasmus Pagh, Francesco Silvestri

Abstract: Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points $S$ and a radius parameter $r>0$, the $r$-near neighbor ($r$-NN) problem asks for a data structure that, given any query point $q$, returns a point $p$ within distance at most $r$ from $q$. In this paper, we study the $r$-NN problem in the light of individual fairness a… ▽ More Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points $S$ and a radius parameter $r>0$, the $r$-near neighbor ($r$-NN) problem asks for a data structure that, given any query point $q$, returns a point $p$ within distance at most $r$ from $q$. In this paper, we study the $r$-NN problem in the light of individual fairness and providing equal opportunities: all points that are within distance $r$ from the query should have the same probability to be returned. In the low-dimensional case, this problem was first studied by Hu, Qiao, and Tao (PODS 2014). Locality sensitive hashing (LSH), the theoretically strongest approach to similarity search in high dimensions, does not provide such a fairness guarantee. In this work, we show that LSH based algorithms can be made fair, without a significant loss in efficiency. We propose several efficient data structures for the exact and approximate variants of the fair NN problem. Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. We also develop a data structure for fair similarity search under inner product that requires nearly-linear space and exploits locality sensitive filters. The paper concludes with an experimental evaluation that highlights the inherent unfairness of NN data structures and shows the performance of our algorithms on real-world datasets. △ Less

Submitted 26 January, 2021; originally announced January 2021.

Comments: arXiv admin note: text overlap with arXiv:1906.02640

arXiv:2011.06545 [pdf, other]

Towards Better Approximation of Graph Crossing Number

Authors: Julia Chuzhoy, Sepideh Mahabadi, Zihan Tan

Abstract: Graph Crossing Number is a fundamental problem with various applications. In this problem, the goal is to draw an input graph $G$ in the plane so as to minimize the number of crossings between the images of its edges. Despite extensive work, non-trivial approximation algorithms are only known for bounded-degree graphs. Even for this special case, the best current algorithm achieves a… ▽ More Graph Crossing Number is a fundamental problem with various applications. In this problem, the goal is to draw an input graph $G$ in the plane so as to minimize the number of crossings between the images of its edges. Despite extensive work, non-trivial approximation algorithms are only known for bounded-degree graphs. Even for this special case, the best current algorithm achieves a $\tilde O(\sqrt n)$-approximation, while the best current negative result is APX-hardness. All current approximation algorithms for the problem build on the same paradigm: compute a set $E'$ of edges (called a \emph{planarizing set}) such that $G\setminus E'$ is planar; compute a planar drawing of $G\setminus E'$; then add the drawings of the edges of $E'$ to the resulting drawing. Unfortunately, there are examples of graphs, in which any implementation of this method must incur $Ω(\text{OPT}^2)$ crossings, where $\text{OPT}$ is the value of the optimal solution. This barrier seems to doom the only known approach to designing approximation algorithms for the problem, and to prevent it from yielding a better than $O(\sqrt n)$-approximation. In this paper we propose a new paradigm that allows us to overcome this barrier. We show an algorithm that, given a bounded-degree graph $G$ and a planarizing set $E'$ of its edges, computes another set $E''$ with $E'\subseteq E''$, such that $|E''|$ is relatively small, and there exists a near-optimal drawing of $G$ in which only edges of $E''$ participate in crossings. This allows us to reduce the Crossing Number problem to \emph{Crossing Number with Rotation System} -- a variant in which the ordering of the edges incident to every vertex is fixed as part of input. We show a randomized algorithm for this new problem, that allows us to obtain an $O(n^{1/2-ε})$-approximation for Crossing Number on bounded-degree graphs, for some constant $ε>0$. △ Less

Submitted 10 January, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

arXiv:2007.03633 [pdf, ps, other]

Streaming Complexity of SVMs

Authors: Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, David P. Woodruff

Abstract: We study the space complexity of solving the bias-regularized SVM problem in the streaming model. This is a classic supervised learning problem that has drawn lots of attention, including for developing fast algorithms for solving the problem approximately. One of the most widely used algorithms for approximately optimizing the SVM objective is Stochastic Gradient Descent (SGD), which requires onl… ▽ More We study the space complexity of solving the bias-regularized SVM problem in the streaming model. This is a classic supervised learning problem that has drawn lots of attention, including for developing fast algorithms for solving the problem approximately. One of the most widely used algorithms for approximately optimizing the SVM objective is Stochastic Gradient Descent (SGD), which requires only $O(\frac{1}{λε})$ random samples, and which immediately yields a streaming algorithm that uses $O(\frac{d}{λε})$ space. For related problems, better streaming algorithms are only known for smooth functions, unlike the SVM objective that we focus on in this work. We initiate an investigation of the space complexity for both finding an approximate optimum of this objective, and for the related ``point estimation'' problem of sketching the data set to evaluate the function value $F_λ$ on any query $(θ, b)$. We show that, for both problems, for dimensions $d=1,2$, one can obtain streaming algorithms with space polynomially smaller than $\frac{1}{λε}$, which is the complexity of SGD for strongly convex functions like the bias-regularized SVM, and which is known to be tight in general, even for $d=1$. We also prove polynomial lower bounds for both point estimation and optimization. In particular, for point estimation we obtain a tight bound of $Θ(1/\sqrtε)$ for $d=1$ and a nearly tight lower bound of $\widetildeΩ(d/ε^2)$ for $d = Ω( \log(1/ε))$. Finally, for optimization, we prove a $Ω(1/\sqrtε)$ lower bound for $d = Ω( \log(1/ε))$, and show similar bounds when $d$ is constant. △ Less

Submitted 7 July, 2020; originally announced July 2020.

Comments: APPROX 2020

arXiv:2004.10969 [pdf, ps, other]

Non-Adaptive Adaptive Sampling on Turnstile Streams

Authors: Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou

Abstract: Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation. Adaptive sampling repeatedly selects rows of an underlying matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$, where $n\gg d$, with probabilities proportional to their distances to the subspace of th… ▽ More Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation. Adaptive sampling repeatedly selects rows of an underlying matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$, where $n\gg d$, with probabilities proportional to their distances to the subspace of the previously selected rows. Intuitively, adaptive sampling seems to be limited to trivial multi-pass algorithms in the streaming model of computation due to its inherently sequential nature of assigning sampling probabilities to each row only after the previous iteration is completed. Surprisingly, we show this is not the case by giving the first one-pass algorithms for adaptive sampling on turnstile streams and using space $\text{poly}(d,k,\log n)$, where $k$ is the number of adaptive sampling rounds to be performed. Our adaptive sampling procedure has a number of applications to various data summarization problems that either improve state-of-the-art or have only been previously studied in the more relaxed row-arrival model. We give the first relative-error algorithms for column subset selection, subspace approximation, projective clustering, and volume maximization on turnstile streams that use space sublinear in $n$. We complement our volume maximization algorithmic results with lower bounds that are tight up to lower order terms, even for multi-pass algorithms. By a similar construction, we also obtain lower bounds for volume maximization in the row-arrival model, which we match with competitive upper bounds. See paper for full abstract. △ Less

Submitted 23 April, 2020; originally announced April 2020.

Comments: To appear at STOC 2020

arXiv:2002.06742 [pdf, other]

Individual Fairness for $k$-Clustering

Authors: Sepideh Mahabadi, Ali Vakilian

Abstract: We give a local search based algorithm for $k$-median and $k$-means (and more generally for any $k$-clustering with $\ell_p$ norm cost function) from the perspective of individual fairness. More precisely, for a point $x$ in a point set $P$ of size $n$, let $r(x)$ be the minimum radius such that the ball of radius $r(x)$ centered at $x$ has at least $n/k$ points from $P$. Intuitively, if a set of… ▽ More We give a local search based algorithm for $k$-median and $k$-means (and more generally for any $k$-clustering with $\ell_p$ norm cost function) from the perspective of individual fairness. More precisely, for a point $x$ in a point set $P$ of size $n$, let $r(x)$ be the minimum radius such that the ball of radius $r(x)$ centered at $x$ has at least $n/k$ points from $P$. Intuitively, if a set of $k$ random points are chosen from $P$ as centers, every point $x\in P$ expects to have a center within radius $r(x)$. An individually fair clustering provides such a guarantee for every point $x\in P$. This notion of fairness was introduced in [Jung et al., 2019] where they showed how to get an approximately feasible $k$-clustering with respect to this fairness condition. In this work, we show how to get a bicriteria approximation for fair $k$-clustering: The $k$-median ($k$-means) cost of our solution is within a constant factor of the cost of an optimal fair $k$-clustering, and our solution approximately satisfies the fairness condition (also within a constant factor). Further, we complement our theoretical bounds with empirical evaluation. △ Less

Submitted 21 September, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

Comments: ICML 2020

arXiv:1907.03197 [pdf, other]

Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm

Authors: Piotr Indyk, Sepideh Mahabadi, Shayan Oveis Gharan, Alireza Rezaei

Abstract: ``Composable core-sets'' are an efficient framework for solving optimization problems in massive data models. In this work, we consider efficient construction of composable core-sets for the determinant maximization problem. This can also be cast as the MAP inference task for determinantal point processes, that have recently gained a lot of interest for modeling diversity and fairness. The problem… ▽ More ``Composable core-sets'' are an efficient framework for solving optimization problems in massive data models. In this work, we consider efficient construction of composable core-sets for the determinant maximization problem. This can also be cast as the MAP inference task for determinantal point processes, that have recently gained a lot of interest for modeling diversity and fairness. The problem was recently studied in [IMOR'18], where they designed composable core-sets with the optimal approximation bound of $\tilde O(k)^k$. On the other hand, the more practical Greedy algorithm has been previously used in similar contexts. In this work, first we provide a theoretical approximation guarantee of $O(C^{k^2})$ for the Greedy algorithm in the context of composable core-sets; Further, we propose to use a Local Search based algorithm that while being still practical, achieves a nearly optimal approximation bound of $O(k)^{2k}$; Finally, we implement all three algorithms and show the effectiveness of our proposed algorithm on standard data sets. △ Less

Submitted 6 July, 2019; originally announced July 2019.

Comments: This paper has appeared in the 36th International Conference on Machine Learning (ICML), 2019. This is an equal contribution paper

ACM Class: F.2.0; G.1.2; G.1.6; G.2.2

arXiv:1906.02640 [pdf, other]

Near Neighbor: Who is the Fairest of Them All?

Authors: Sariel Har-Peled, Sepideh Mahabadi

Abstract: $\newcommand{\ball}{\mathbb{B}}\newcommand{\dsQ}{\mathcal{Q}}\newcommand{\dsS}{\mathcal{S}}$In this work we study a fair variant of the near neighbor problem. Namely, given a set of $n$ points $P$ and a parameter $r$, the goal is to preprocess the points, such that given a query point $q$, any point in the $r$-neighborhood of the query, i.e., $\ball(q,r)… ▽ More $\newcommand{\ball}{\mathbb{B}}\newcommand{\dsQ}{\mathcal{Q}}\newcommand{\dsS}{\mathcal{S}}$In this work we study a fair variant of the near neighbor problem. Namely, given a set of $n$ points $P$ and a parameter $r$, the goal is to preprocess the points, such that given a query point $q$, any point in the $r$-neighborhood of the query, i.e., $\ball(q,r)$, have the same probability of being reported as the near neighbor. We show that LSH based algorithms can be made fair, without a significant loss in efficiency. Specifically, we show an algorithm that reports a point in the $r$-neighborhood of a query $q$ with almost uniform probability. The query time is proportional to $O\bigl( \mathrm{dns}(q.r) \dsQ(n,c) \bigr)$, and its space is $O(\dsS(n,c))$, where $\dsQ(n,c)$ and $\dsS(n,c)$ are the query time and space of an LSH algorithm for $c$-approximate near neighbor, and $\mathrm{dns}(q,r)$ is a function of the local density around $q$. Our approach works more generally for sampling uniformly from a sub-collection of sets of a given collection and can be used in a few other applications. Finally, we run experiments to show performance of our approach on real data. △ Less

Submitted 21 November, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

Comments: To appear in NIPS 2019

arXiv:1902.03534 [pdf, ps, other]

Set Cover in Sub-linear Time

Authors: Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Ali Vakilian, Anak Yodpinyanee

Abstract: We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of $m$ sets over $n$ elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in Har-Peled et al. [2016] to the sub-linear quer… ▽ More We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of $m$ sets over $n$ elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in Har-Peled et al. [2016] to the sub-linear query model, that returns an $α$-approximate cover using $\tilde{O}(m(n/k)^{1/(α-1)} + nk)$ queries to the input, where $k$ denotes the value of a minimum set cover. We then complement this upper bound by proving that for lower values of $k$, the required number of queries is $\tildeΩ(m(n/k)^{1/(2α)})$, even for estimating the optimal cover size. Moreover, we prove that even checking whether a given collection of sets covers all the elements would require $Ω(nk)$ queries. These two lower bounds provide strong evidence that the upper bound is almost tight for certain values of the parameter $k$. On the other hand, we show that this bound is not optimal for larger values of the parameter $k$, as there exists a $(1+\varepsilon)$-approximation algorithm with $\tilde{O}(mn/k\varepsilon^2)$ queries. We show that this bound is essentially tight for sufficiently small constant $\varepsilon$, by establishing a lower bound of $\tildeΩ(mn/k)$ query complexity. △ Less

Submitted 9 February, 2019; originally announced February 2019.

arXiv:1811.03591 [pdf, other]

Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extensions

Authors: Sepideh Mahabadi, Konstantin Makarychev, Yury Makarychev, Ilya Razenshteyn

Abstract: We introduce and study the notion of an outer bi-Lipschitz extension of a map between Euclidean spaces. The notion is a natural analogue of the notion of a Lipschitz extension of a Lipschitz map. We show that for every map $f$ there exists an outer bi-Lipschitz extension $f'$ whose distortion is greater than that of $f$ by at most a constant factor. This result can be seen as a counterpart of the… ▽ More We introduce and study the notion of an outer bi-Lipschitz extension of a map between Euclidean spaces. The notion is a natural analogue of the notion of a Lipschitz extension of a Lipschitz map. We show that for every map $f$ there exists an outer bi-Lipschitz extension $f'$ whose distortion is greater than that of $f$ by at most a constant factor. This result can be seen as a counterpart of the classic Kirszbraun theorem for outer bi-Lipschitz extensions. We also study outer bi-Lipschitz extensions of near-isometric maps and show upper and lower bounds for them. Then, we present applications of our results to prioritized and terminal dimension reduction problems. * We prove a prioritized variant of the Johnson-Lindenstrauss lemma: given a set of points $X\subset \mathbb{R}^d$ of size $N$ and a permutation ("priority ranking") of $X$, there exists an embedding $f$ of $X$ into $\mathbb{R}^{O(\log N)}$ with distortion $O(\log \log N)$ such that the point of rank $j$ has only $O(\log^{3 + \varepsilon} j)$ non-zero coordinates - more specifically, all but the first $O(\log^{3+\varepsilon} j)$ coordinates are equal to $0$; the distortion of $f$ restricted to the first $j$ points (according to the ranking) is at most $O(\log\log j)$. The result makes a progress towards answering an open question by Elkin, Filtser, and Neiman about prioritized dimension reductions. * We prove that given a set $X$ of $N$ points in $\mathbb{R}^d$, there exists a terminal dimension reduction embedding of $\mathbb{R}^d$ into $\mathbb{R}^{d'}$, where $d' = O\left(\frac{\log N}{\varepsilon^4}\right)$, which preserves distances $\|x-y\|$ between points $x\in X$ and $y \in \mathbb{R}^{d}$, up to a multiplicative factor of $1 \pm \varepsilon$. This improves a recent result by Elkin, Filtser, and Neiman. The dimension reductions that we obtain are nonlinear, and this nonlinearity is necessary. △ Less

Submitted 8 November, 2018; originally announced November 2018.

Comments: 27 pages, 6 figures; an extended abstract appeared in the proceedings of STOC 2018

arXiv:1807.11648 [pdf, ps, other]

Composable Core-sets for Determinant Maximization Problems via Spectral Spanners

Authors: Piotr Indyk, Sepideh Mahabadi, Shayan Oveis Gharan, Alireza Rezaei

Abstract: We study a spectral generalization of classical combinatorial graph spanners to the spectral setting. Given a set of vectors $V\subseteq \Re^d$, we say a set $U\subseteq V$ is an $α$-spectral spanner if for all $v\in V$ there is a probability distribution $μ_v$ supported on $U$ such that $$vv^\intercal \preceq α\cdot\mathbb{E}_{u\simμ_v} uu^\intercal.$$ We show that any set $V$ has an… ▽ More We study a spectral generalization of classical combinatorial graph spanners to the spectral setting. Given a set of vectors $V\subseteq \Re^d$, we say a set $U\subseteq V$ is an $α$-spectral spanner if for all $v\in V$ there is a probability distribution $μ_v$ supported on $U$ such that $$vv^\intercal \preceq α\cdot\mathbb{E}_{u\simμ_v} uu^\intercal.$$ We show that any set $V$ has an $\tilde{O}(d)$-spectral spanner of size $\tilde{O}(d)$ and this bound is almost optimal in the worst case. We use spectral spanners to study composable core-sets for spectral problems. We show that for many objective functions one can use a spectral spanner, independent of the underlying functions, as a core-set and obtain almost optimal composable core-sets. For example, for the determinant maximization problem we obtain an $\tilde{O}(k)^k$-composable core-set and we show that this is almost optimal in the worst case. Our algorithm is a spectral analogue of the classical greedy algorithm for finding (combinatorial) spanners in graphs. We expect that our spanners find many other applications in distributed or parallel models of computation. Our proof is spectral. As a side result of our techniques, we show that the rank of diagonally dominant lower-triangular matrices are robust under `small perturbations' which could be of independent interests. △ Less

Submitted 16 November, 2019; v1 submitted 30 July, 2018; originally announced July 2018.

Comments: To appear in SODA 2020

arXiv:1704.02546 [pdf, ps, other]

LSH on the Hypercube Revisited

Authors: Sariel Har-Peled, Sepideh Mahabadi

Abstract: LSH (locality sensitive hashing) had emerged as a powerful technique in nearest-neighbor search in high dimensions [IM98, HIM12]. Given a point set $P$ in a metric space, and given parameters $r$ and $\varepsilon > 0$, the task is to preprocess the point set, such that given a query point $q$, one can quickly decide if $q$ is in distance at most $\leq r$ or $\geq (1+\varepsilon)r$ from the point s… ▽ More LSH (locality sensitive hashing) had emerged as a powerful technique in nearest-neighbor search in high dimensions [IM98, HIM12]. Given a point set $P$ in a metric space, and given parameters $r$ and $\varepsilon > 0$, the task is to preprocess the point set, such that given a query point $q$, one can quickly decide if $q$ is in distance at most $\leq r$ or $\geq (1+\varepsilon)r$ from the point set $P$. Once such a near-neighbor data-structure is available, one can reduce the general nearest-neighbor search to logarithmic number of queries in such structures [IM98, Har01, HIM12]. In this note, we revisit the most basic settings, where $P$ is a set of points in the binary hypercube $\{0,1\}^d$, under the $L_1$/Hamming metric, and present a short description of the LSH scheme in this case. We emphasize that there is no new contribution in this note, except (maybe) the presentation itself, which is inspired by the authors recent work [HM17]. △ Less

Submitted 8 April, 2017; originally announced April 2017.

arXiv:1609.08739 [pdf, other]

doi 10.4230/LIPIcs.ICALP.2018.392

Approximate Sparse Linear Regression

Authors: Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi

Abstract: In the Sparse Linear Regression (SLR) problem, given a $d \times n$ matrix $M$ and a $d$-dimensional query $q$, the goal is to compute a $k$-sparse $n$-dimensional vector $τ$ such that the error $||M τ-q||$ is minimized. This problem is equivalent to the following geometric problem: given a set $P$ of $n$ points and a query point $q$ in $d$ dimensions, find the closest $k$-dimensional subspace to… ▽ More In the Sparse Linear Regression (SLR) problem, given a $d \times n$ matrix $M$ and a $d$-dimensional query $q$, the goal is to compute a $k$-sparse $n$-dimensional vector $τ$ such that the error $||M τ-q||$ is minimized. This problem is equivalent to the following geometric problem: given a set $P$ of $n$ points and a query point $q$ in $d$ dimensions, find the closest $k$-dimensional subspace to $q$, that is spanned by a subset of $k$ points in $P$. In this paper, we present data-structures/algorithms and conditional lower bounds for several variants of this problem (such as finding the closest induced $k$ dimensional flat/simplex instead of a subspace). In particular, we present approximation algorithms for the online variants of the above problems with query time $\tilde O(n^{k-1})$, which are of interest in the "low sparsity regime" where $k$ is small, e.g., $2$ or $3$. For $k=d$, this matches, up to polylogarithmic factors, the lower bound that relies on the affinely degenerate conjecture (i.e., deciding if $n$ points in $\mathbb{R}^d$ contains $d+1$ points contained in a hyperplane takes $Ω(n^d)$ time). Moreover, our algorithms involve formulating and solving several geometric subproblems, which we believe to be of independent interest. △ Less

Submitted 28 April, 2018; v1 submitted 27 September, 2016; originally announced September 2016.

arXiv:1604.02188 [pdf, other]

doi 10.4230/LIPIcs.SoCG.2016.44

Simultaneous Nearest Neighbor Search

Authors: Piotr Indyk, Robert Kleinberg, Sepideh Mahabadi, Yang Yuan

Abstract: Motivated by applications in computer vision and databases, we introduce and study the Simultaneous Nearest Neighbor Search (SNN) problem. Given a set of data points, the goal of SNN is to design a data structure that, given a collection of queries, finds a collection of close points that are compatible with each other. Formally, we are given $k$ query points $Q=q_1,\cdots,q_k$, and a compatibilit… ▽ More Motivated by applications in computer vision and databases, we introduce and study the Simultaneous Nearest Neighbor Search (SNN) problem. Given a set of data points, the goal of SNN is to design a data structure that, given a collection of queries, finds a collection of close points that are compatible with each other. Formally, we are given $k$ query points $Q=q_1,\cdots,q_k$, and a compatibility graph $G$ with vertices in $Q$, and the goal is to return data points $p_1,\cdots,p_k$ that minimize (i) the weighted sum of the distances from $q_i$ to $p_i$ and (ii) the weighted sum, over all edges $(i,j)$ in the compatibility graph $G$, of the distances between $p_i$ and $p_j$. The problem has several applications, where one wants to return a set of consistent answers to multiple related queries. This generalizes well-studied computational problems, including NN, Aggregate NN and the 0-extension problem. In this paper we propose and analyze the following general two-step method for designing efficient data structures for SNN. In the first step, for each query point $q_i$ we find its (approximate) nearest neighbor point $\hat{p}_i$; this can be done efficiently using existing approximate nearest neighbor structures. In the second step, we solve an off-line optimization problem over sets $q_1,\cdots,q_k$ and $\hat{p}_1,\cdots,\hat{p}_k$; this can be done efficiently given that $k$ is much smaller than $n$. Even though $\hat{p}_1,\cdots,\hat{p}_k$ might not constitute the optimal answers to queries $q_1,\cdots,q_k$, we show that, for the unweighted case, the resulting algorithm is $O(\log k/\log \log k)$-approximation. Also, we show that the approximation factor can be in fact reduced to a constant for compatibility graphs frequently occurring in practice. Finally, we show that the "empirical approximation factor" provided by the above approach is very close to 1. △ Less

Submitted 7 April, 2016; originally announced April 2016.

ACM Class: F.2.2

arXiv:1511.07357 [pdf, ps, other]

Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search

Authors: Sariel Har-Peled, Sepideh Mahabadi

Abstract: We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of $n$ points $P=\{ x_1,\ldots, x_n\}$ in high-dimensions, and a parameter $k$, the goal is to preprocess the dataset, such that given a query point $q$, one can compute quickly a point $x \in P$, such that the distanc… ▽ More We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of $n$ points $P=\{ x_1,\ldots, x_n\}$ in high-dimensions, and a parameter $k$, the goal is to preprocess the dataset, such that given a query point $q$, one can compute quickly a point $x \in P$, such that the distance of the query to the point $x$ is minimized, when ignoring the "optimal" $k$ coordinates. Note, that the coordinates being ignored are a function of both the query point and the point returned. We present a general reduction from this problem to answering ANN queries, which is similar in spirit to LSH (locality sensitive hashing) [IM98]. Specifically, we give a sampling technique which achieves a bi-criterion approximation for this problem. If the distance to the nearest neighbor after ignoring $k$ coordinates is $r$, the data-structure returns a point that is within a distance of $O(r)$ after ignoring $O(k)$ coordinates. We also present other applications and further extensions and refinements of the above result. The new data-structures are simple and (arguably) elegant, and should be practical -- specifically, all bounds are polynomial in all relevant parameters (including the dimension of the space, and the robustness parameter $k$). △ Less

Submitted 23 November, 2015; originally announced November 2015.

arXiv:1509.00118 [pdf, other]

doi 10.1145/2902251.2902287

Towards Tight Bounds for the Streaming Set Cover Problem

Authors: Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, Ali Vakilian

Abstract: We consider the classic Set Cover problem in the data stream model. For $n$ elements and $m$ sets ($m\geq n$) we give a $O(1/δ)$-pass algorithm with a strongly sub-linear $\tilde{O}(mn^δ)$ space and logarithmic approximation factor. This yields a significant improvement over the earlier algorithm of Demaine et al. [DIMV14] that uses exponentially larger number of passes. We complement this result… ▽ More We consider the classic Set Cover problem in the data stream model. For $n$ elements and $m$ sets ($m\geq n$) we give a $O(1/δ)$-pass algorithm with a strongly sub-linear $\tilde{O}(mn^δ)$ space and logarithmic approximation factor. This yields a significant improvement over the earlier algorithm of Demaine et al. [DIMV14] that uses exponentially larger number of passes. We complement this result by showing that the tradeoff between the number of passes and space exhibited by our algorithm is tight, at least when the approximation factor is equal to $1$. Specifically, we show that any algorithm that computes set cover exactly using $({1 \over 2δ}-1)$ passes must use $\tildeΩ(mn^δ)$ space in the regime of $m=O(n)$. Furthermore, we consider the problem in the geometric setting where the elements are points in $\mathbb{R}^2$ and sets are either discs, axis-parallel rectangles, or fat triangles in the plane, and show that our algorithm (with a slight modification) uses the optimal $\tilde{O}(n)$ space to find a logarithmic approximation in $O(1/δ)$ passes. Finally, we show that any randomized one-pass algorithm that distinguishes between covers of size 2 and 3 must use a linear (i.e., $Ω(mn)$) amount of space. This is the first result showing that a randomized, approximate algorithm cannot achieve a space bound that is sublinear in the input size. This indicates that using multiple passes might be necessary in order to achieve sub-linear space bounds for this problem while guaranteeing small approximation factors. △ Less

Submitted 2 May, 2016; v1 submitted 31 August, 2015; originally announced September 2015.

Comments: A preliminary version of this paper is to appear in PODS 2016

Showing 1–24 of 24 results for author: Mahabadi, S