Zum Hauptinhalt springen

Showing 1–24 of 24 results for author: Mahabadi, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.08917  [pdf, other

    cs.CR cs.DS cs.LG

    Efficiently Computing Similarities to Private Datasets

    Authors: Arturs Backurs, Zinan Lin, Sepideh Mahabadi, Sandeep Silwal, Jakub Tarnawski

    Abstract: Many methods in differentially private model training rely on computing the similarity between a query point (such as public or synthetic data) and private data. We abstract out this common subroutine and study the following fundamental algorithmic problem: Given a similarity function $f$ and a large high-dimensional private dataset $X \subset \mathbb{R}^d$, output a differentially private (DP) da… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: To appear at ICLR 2024

  2. arXiv:2402.10806  [pdf, other

    cs.DS

    Streaming Algorithms for Connectivity Augmentation

    Authors: Ce Jin, Michael Kapralov, Sepideh Mahabadi, Ali Vakilian

    Abstract: We study the $k$-connectivity augmentation problem ($k$-CAP) in the single-pass streaming model. Given a $(k-1)$-edge connected graph $G=(V,E)$ that is stored in memory, and a stream of weighted edges $L$ with weights in $\{0,1,\dots,W\}$, the goal is to choose a minimum weight subset $L'\subseteq L$ such that $G'=(V,E\cup L')$ is $k$-edge connected. We give a $(2+ε)$-approximation algorithm for t… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  3. arXiv:2310.08122  [pdf, other

    cs.DS cs.LG

    Core-sets for Fair and Diverse Data Summarization

    Authors: Sepideh Mahabadi, Stojan Trajanovski

    Abstract: We study core-set construction algorithms for the task of Diversity Maximization under fairness/partition constraint. Given a set of points $P$ in a metric space partitioned into $m$ groups, and given $k_1,\ldots,k_m$, the goal of this problem is to pick $k_i$ points from each group $i$ such that the overall diversity of the $k=\sum_i k_i$ picked points is maximized. We consider two natural divers… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  4. arXiv:2310.00175  [pdf, other

    cs.DS cs.LG

    Tight Bounds for Volumetric Spanners and Applications

    Authors: Aditya Bhaskara, Sepideh Mahabadi, Ali Vakilian

    Abstract: Given a set of points of interest, a volumetric spanner is a subset of the points using which all the points can be expressed using "small" coefficients (measured in an appropriate norm). Formally, given a set of vectors $X = \{v_1, v_2, \dots, v_n\}$, the goal is to find $T \subseteq [n]$ such that every $v \in X$ can be expressed as $\sum_{i\in T} α_i v_i$, with $\|α\|$ being small. This notion,… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  5. arXiv:2309.15286  [pdf, other

    cs.DS cs.LG

    Composable Coresets for Determinant Maximization: Greedy is Almost Optimal

    Authors: Siddharth Gollapudi, Sepideh Mahabadi, Varun Sivashankar

    Abstract: Given a set of $n$ vectors in $\mathbb{R}^d$, the goal of the \emph{determinant maximization} problem is to pick $k$ vectors with the maximum volume. Determinant maximization is the MAP-inference task for determinantal point processes (DPP) and has recently received considerable attention for modeling diversity. As most applications for the problem use large amounts of data, this problem has been… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to NeurIPS 2023

    ACM Class: F.2.0; G.1.2; G.1.6; G.2.2

  6. arXiv:2307.04329  [pdf, ps, other

    cs.DS cs.CG

    Improved Diversity Maximization Algorithms for Matching and Pseudoforest

    Authors: Sepideh Mahabadi, Shyam Narayanan

    Abstract: In this work we consider the diversity maximization problem, where given a data set $X$ of $n$ elements, and a parameter $k$, the goal is to pick a subset of $X$ of size $k$ maximizing a certain diversity measure. [CH01] defined a variety of diversity measures based on pairwise distances between the points. A constant factor approximation algorithm was known for all those diversity measures except… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: 27 pages, 1 table. Accepted to APPROX, 2023

  7. arXiv:2306.06778  [pdf, other

    cs.LG cs.AI cs.DS

    Approximation Algorithms for Fair Range Clustering

    Authors: Sèdjro S. Hotegni, Sepideh Mahabadi, Ali Vakilian

    Abstract: This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick $k$ centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of $n$ points in a metric space $(P,d)$ where each point belongs to one o… ▽ More

    Submitted 22 June, 2023; v1 submitted 11 June, 2023; originally announced June 2023.

    Comments: ICML 2023

  8. arXiv:2211.00289  [pdf, ps, other

    cs.DS cs.CG cs.DC

    Composable Coresets for Constrained Determinant Maximization and Beyond

    Authors: Sepideh Mahabadi, Thuy-Duong Vuong

    Abstract: We study the task of determinant maximization under partition constraint, in the context of large data sets. Given a point set $V\subset \mathbb{R}^d$ that is partitioned into $s$ groups $V_1,..., V_s$, and integers $k_1,...,k_s$ where $k=\sum_i k_i$, the goal is to pick $k_i$ points from group $i$ such that the overall determinant of the picked $k$ points is maximized. Determinant Maximization an… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

  9. arXiv:2207.07822  [pdf, ps, other

    cs.LG cs.DS

    Adaptive Sketches for Robust Regression with Importance Sampling

    Authors: Sepideh Mahabadi, David P. Woodruff, Samson Zhou

    Abstract: We introduce data structures for solving robust regression through stochastic gradient descent (SGD) by sampling gradients with probability proportional to their norm, i.e., importance sampling. Although SGD is widely used for large scale machine learning, it is well-known for possibly experiencing slow convergence rates due to the high variance from uniform sampling. On the other hand, importance… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: RANDOM 2022

  10. arXiv:2101.10905  [pdf, other

    cs.DS cs.DB cs.LG

    Sampling a Near Neighbor in High Dimensions -- Who is the Fairest of Them All?

    Authors: Martin Aumüller, Sariel Har-Peled, Sepideh Mahabadi, Rasmus Pagh, Francesco Silvestri

    Abstract: Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points $S$ and a radius parameter $r>0$, the $r$-near neighbor ($r$-NN) problem asks for a data structure that, given any query point $q$, returns a point $p$ within distance at most $r$ from $q$. In this paper, we study the $r$-NN problem in the light of individual fairness a… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: arXiv admin note: text overlap with arXiv:1906.02640

  11. arXiv:2011.06545  [pdf, other

    cs.DS cs.CG

    Towards Better Approximation of Graph Crossing Number

    Authors: Julia Chuzhoy, Sepideh Mahabadi, Zihan Tan

    Abstract: Graph Crossing Number is a fundamental problem with various applications. In this problem, the goal is to draw an input graph $G$ in the plane so as to minimize the number of crossings between the images of its edges. Despite extensive work, non-trivial approximation algorithms are only known for bounded-degree graphs. Even for this special case, the best current algorithm achieves a… ▽ More

    Submitted 10 January, 2021; v1 submitted 12 November, 2020; originally announced November 2020.

  12. arXiv:2007.03633  [pdf, ps, other

    cs.DS cs.CC cs.CG cs.LG

    Streaming Complexity of SVMs

    Authors: Alexandr Andoni, Collin Burns, Yi Li, Sepideh Mahabadi, David P. Woodruff

    Abstract: We study the space complexity of solving the bias-regularized SVM problem in the streaming model. This is a classic supervised learning problem that has drawn lots of attention, including for developing fast algorithms for solving the problem approximately. One of the most widely used algorithms for approximately optimizing the SVM objective is Stochastic Gradient Descent (SGD), which requires onl… ▽ More

    Submitted 7 July, 2020; originally announced July 2020.

    Comments: APPROX 2020

  13. arXiv:2004.10969  [pdf, ps, other

    cs.DS cs.CG cs.LG

    Non-Adaptive Adaptive Sampling on Turnstile Streams

    Authors: Sepideh Mahabadi, Ilya Razenshteyn, David P. Woodruff, Samson Zhou

    Abstract: Adaptive sampling is a useful algorithmic tool for data summarization problems in the classical centralized setting, where the entire dataset is available to the single processor performing the computation. Adaptive sampling repeatedly selects rows of an underlying matrix $\mathbf{A}\in\mathbb{R}^{n\times d}$, where $n\gg d$, with probabilities proportional to their distances to the subspace of th… ▽ More

    Submitted 23 April, 2020; originally announced April 2020.

    Comments: To appear at STOC 2020

  14. arXiv:2002.06742  [pdf, other

    cs.DS cs.LG stat.ML

    Individual Fairness for $k$-Clustering

    Authors: Sepideh Mahabadi, Ali Vakilian

    Abstract: We give a local search based algorithm for $k$-median and $k$-means (and more generally for any $k$-clustering with $\ell_p$ norm cost function) from the perspective of individual fairness. More precisely, for a point $x$ in a point set $P$ of size $n$, let $r(x)$ be the minimum radius such that the ball of radius $r(x)$ centered at $x$ has at least $n/k$ points from $P$. Intuitively, if a set of… ▽ More

    Submitted 21 September, 2020; v1 submitted 16 February, 2020; originally announced February 2020.

    Comments: ICML 2020

  15. arXiv:1907.03197  [pdf, other

    cs.DS cs.LG

    Composable Core-sets for Determinant Maximization: A Simple Near-Optimal Algorithm

    Authors: Piotr Indyk, Sepideh Mahabadi, Shayan Oveis Gharan, Alireza Rezaei

    Abstract: ``Composable core-sets'' are an efficient framework for solving optimization problems in massive data models. In this work, we consider efficient construction of composable core-sets for the determinant maximization problem. This can also be cast as the MAP inference task for determinantal point processes, that have recently gained a lot of interest for modeling diversity and fairness. The problem… ▽ More

    Submitted 6 July, 2019; originally announced July 2019.

    Comments: This paper has appeared in the 36th International Conference on Machine Learning (ICML), 2019. This is an equal contribution paper

    ACM Class: F.2.0; G.1.2; G.1.6; G.2.2

  16. arXiv:1906.02640  [pdf, other

    cs.LG cs.CG cs.DS stat.ML

    Near Neighbor: Who is the Fairest of Them All?

    Authors: Sariel Har-Peled, Sepideh Mahabadi

    Abstract: $\newcommand{\ball}{\mathbb{B}}\newcommand{\dsQ}{\mathcal{Q}}\newcommand{\dsS}{\mathcal{S}}$In this work we study a fair variant of the near neighbor problem. Namely, given a set of $n$ points $P$ and a parameter $r$, the goal is to preprocess the points, such that given a query point $q$, any point in the $r$-neighborhood of the query, i.e., $\ball(q,r)… ▽ More

    Submitted 21 November, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: To appear in NIPS 2019

  17. arXiv:1902.03534  [pdf, ps, other

    cs.DS cs.DM

    Set Cover in Sub-linear Time

    Authors: Piotr Indyk, Sepideh Mahabadi, Ronitt Rubinfeld, Ali Vakilian, Anak Yodpinyanee

    Abstract: We study the classic set cover problem from the perspective of sub-linear algorithms. Given access to a collection of $m$ sets over $n$ elements in the query model, we show that sub-linear algorithms derived from existing techniques have almost tight query complexities. On one hand, first we show an adaptation of the streaming algorithm presented in Har-Peled et al. [2016] to the sub-linear quer… ▽ More

    Submitted 9 February, 2019; originally announced February 2019.

  18. arXiv:1811.03591  [pdf, other

    cs.DS cs.CG cs.LG math.MG

    Nonlinear Dimension Reduction via Outer Bi-Lipschitz Extensions

    Authors: Sepideh Mahabadi, Konstantin Makarychev, Yury Makarychev, Ilya Razenshteyn

    Abstract: We introduce and study the notion of an outer bi-Lipschitz extension of a map between Euclidean spaces. The notion is a natural analogue of the notion of a Lipschitz extension of a Lipschitz map. We show that for every map $f$ there exists an outer bi-Lipschitz extension $f'$ whose distortion is greater than that of $f$ by at most a constant factor. This result can be seen as a counterpart of the… ▽ More

    Submitted 8 November, 2018; originally announced November 2018.

    Comments: 27 pages, 6 figures; an extended abstract appeared in the proceedings of STOC 2018

  19. arXiv:1807.11648  [pdf, ps, other

    cs.DS cs.LG stat.ML

    Composable Core-sets for Determinant Maximization Problems via Spectral Spanners

    Authors: Piotr Indyk, Sepideh Mahabadi, Shayan Oveis Gharan, Alireza Rezaei

    Abstract: We study a spectral generalization of classical combinatorial graph spanners to the spectral setting. Given a set of vectors $V\subseteq \Re^d$, we say a set $U\subseteq V$ is an $α$-spectral spanner if for all $v\in V$ there is a probability distribution $μ_v$ supported on $U$ such that $$vv^\intercal \preceq α\cdot\mathbb{E}_{u\simμ_v} uu^\intercal.$$ We show that any set $V$ has an… ▽ More

    Submitted 16 November, 2019; v1 submitted 30 July, 2018; originally announced July 2018.

    Comments: To appear in SODA 2020

  20. arXiv:1704.02546  [pdf, ps, other

    cs.CG

    LSH on the Hypercube Revisited

    Authors: Sariel Har-Peled, Sepideh Mahabadi

    Abstract: LSH (locality sensitive hashing) had emerged as a powerful technique in nearest-neighbor search in high dimensions [IM98, HIM12]. Given a point set $P$ in a metric space, and given parameters $r$ and $\varepsilon > 0$, the task is to preprocess the point set, such that given a query point $q$, one can quickly decide if $q$ is in distance at most $\leq r$ or $\geq (1+\varepsilon)r$ from the point s… ▽ More

    Submitted 8 April, 2017; originally announced April 2017.

  21. Approximate Sparse Linear Regression

    Authors: Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi

    Abstract: In the Sparse Linear Regression (SLR) problem, given a $d \times n$ matrix $M$ and a $d$-dimensional query $q$, the goal is to compute a $k$-sparse $n$-dimensional vector $τ$ such that the error $||M τ-q||$ is minimized. This problem is equivalent to the following geometric problem: given a set $P$ of $n$ points and a query point $q$ in $d$ dimensions, find the closest $k$-dimensional subspace to… ▽ More

    Submitted 28 April, 2018; v1 submitted 27 September, 2016; originally announced September 2016.

  22. Simultaneous Nearest Neighbor Search

    Authors: Piotr Indyk, Robert Kleinberg, Sepideh Mahabadi, Yang Yuan

    Abstract: Motivated by applications in computer vision and databases, we introduce and study the Simultaneous Nearest Neighbor Search (SNN) problem. Given a set of data points, the goal of SNN is to design a data structure that, given a collection of queries, finds a collection of close points that are compatible with each other. Formally, we are given $k$ query points $Q=q_1,\cdots,q_k$, and a compatibilit… ▽ More

    Submitted 7 April, 2016; originally announced April 2016.

    ACM Class: F.2.2

  23. arXiv:1511.07357  [pdf, ps, other

    cs.CG

    Proximity in the Age of Distraction: Robust Approximate Nearest Neighbor Search

    Authors: Sariel Har-Peled, Sepideh Mahabadi

    Abstract: We introduce a new variant of the nearest neighbor search problem, which allows for some coordinates of the dataset to be arbitrarily corrupted or unknown. Formally, given a dataset of $n$ points $P=\{ x_1,\ldots, x_n\}$ in high-dimensions, and a parameter $k$, the goal is to preprocess the dataset, such that given a query point $q$, one can compute quickly a point $x \in P$, such that the distanc… ▽ More

    Submitted 23 November, 2015; originally announced November 2015.

  24. Towards Tight Bounds for the Streaming Set Cover Problem

    Authors: Sariel Har-Peled, Piotr Indyk, Sepideh Mahabadi, Ali Vakilian

    Abstract: We consider the classic Set Cover problem in the data stream model. For $n$ elements and $m$ sets ($m\geq n$) we give a $O(1/δ)$-pass algorithm with a strongly sub-linear $\tilde{O}(mn^δ)$ space and logarithmic approximation factor. This yields a significant improvement over the earlier algorithm of Demaine et al. [DIMV14] that uses exponentially larger number of passes. We complement this result… ▽ More

    Submitted 2 May, 2016; v1 submitted 31 August, 2015; originally announced September 2015.

    Comments: A preliminary version of this paper is to appear in PODS 2016