Search | arXiv e-print repository

On Computationally Efficient Multi-Class Calibration

Authors: Parikshit Gopalan, Lunjia Hu, Guy N. Rothblum

Abstract: Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration… ▽ More Consider a multi-class labelling problem, where the labels can take values in $[k]$, and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in $k$? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in $k$, or needing to solve computationally intractable problems, or give rather weak guarantees. Our main contribution is a notion of calibration that achieves all these desiderata: we formulate a robust notion of projected smooth calibration for multi-class predictions, and give new recalibration algorithms for efficiently calibrating predictors under this definition with complexity polynomial in $k$. Projected smooth calibration gives strong guarantees for all downstream decision makers who want to use the predictor for binary classification problems of the form: does the label belong to a subset $T \subseteq [k]$: e.g. is this an image of an animal? It ensures that the probabilities predicted by summing the probabilities assigned to labels in $T$ are close to some perfectly calibrated binary predictor for that task. We also show that natural strengthenings of our definition are computationally hard to achieve: they run into information theoretic barriers or computational intractability. Underlying both our upper and lower bounds is a tight connection that we prove between multi-class calibration and the well-studied problem of agnostic learning in the (standard) binary prediction setting. △ Less

Submitted 8 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: In COLT 2024

arXiv:2401.14645 [pdf, ps, other]

Omnipredictors for Regression and the Approximate Rank of Convex Functions

Authors: Parikshit Gopalan, Princewill Okoroafor, Prasad Raghavendra, Abhishek Shetty, Mihir Singhal

Abstract: Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of… ▽ More Consider the supervised learning setting where the goal is to learn to predict labels $\mathbf y$ given points $\mathbf x$ from a distribution. An \textit{omnipredictor} for a class $\mathcal L$ of loss functions and a class $\mathcal C$ of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in $\mathcal C$ for every loss in $\mathcal L$. Since the work of [GKR+21] that introduced the notion, there has been a large body of work in the setting of binary labels where $\mathbf y \in \{0, 1\}$, but much less is known about the regression setting where $\mathbf y \in [0,1]$ can be continuous. Our main conceptual contribution is the notion of \textit{sufficient statistics} for loss minimization over a family of loss functions: these are a set of statistics about a distribution such that knowing them allows one to take actions that minimize the expected loss for any loss in the family. The notion of sufficient statistics relates directly to the approximate rank of the family of loss functions. Our key technical contribution is a bound of $O(1/\varepsilon^{2/3})$ on the $ε$-approximate rank of convex, Lipschitz functions on the interval $[0,1]$, which we show is tight up to a factor of $\mathrm{polylog} (1/ε)$. This yields improved runtimes for learning omnipredictors for the class of all convex, Lipschitz loss functions under weak learnability assumptions about the class $\mathcal C$. We also give efficient omnipredictors when the loss families have low-degree polynomial approximations, or arise from generalized linear models (GLMs). This translation from sufficient statistics to faster omnipredictors is made possible by lifting the technique of loss outcome indistinguishability introduced by [GKH+23] for Boolean labels to the regression setting. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2306.10615 [pdf, ps, other]

Agnostically Learning Single-Index Models using Omnipredictors

Authors: Aravind Gollakota, Parikshit Gopalan, Adam R. Klivans, Konstantinos Stavropoulos

Abstract: We give the first result for agnostically learning Single-Index Models (SIMs) with arbitrary monotone and Lipschitz activations. All prior work either held only in the realizable setting or required the activation to be known. Moreover, we only require the marginal to have bounded second moments, whereas all prior work required stronger distributional assumptions (such as anticoncentration or boun… ▽ More We give the first result for agnostically learning Single-Index Models (SIMs) with arbitrary monotone and Lipschitz activations. All prior work either held only in the realizable setting or required the activation to be known. Moreover, we only require the marginal to have bounded second moments, whereas all prior work required stronger distributional assumptions (such as anticoncentration or boundedness). Our algorithm is based on recent work by [GHK$^+$23] on omniprediction using predictors satisfying calibrated multiaccuracy. Our analysis is simple and relies on the relationship between Bregman divergences (or matching losses) and $\ell_p$ distances. We also provide new guarantees for standard algorithms like GLMtron and logistic regression in the agnostic setting. △ Less

Submitted 18 June, 2023; originally announced June 2023.

Comments: 21 pages

arXiv:2305.18764 [pdf, other]

When Does Optimizing a Proper Loss Yield Calibration?

Authors: Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran

Abstract: Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated. However, typical machine learning models are trained to approximately minimize loss over restricted families of predictors, that are unlikely to contain the… ▽ More Optimizing proper loss functions is popularly believed to yield predictors with good calibration properties; the intuition being that for such losses, the global optimum is to predict the ground-truth probabilities, which is indeed calibrated. However, typical machine learning models are trained to approximately minimize loss over restricted families of predictors, that are unlikely to contain the ground truth. Under what circumstances does optimizing proper loss over a restricted family yield calibrated models? What precise calibration guarantees does it give? In this work, we provide a rigorous answer to these questions. We replace the global optimality with a local optimality condition stipulating that the (proper) loss of the predictor cannot be reduced much by post-processing its predictions with a certain family of Lipschitz functions. We show that any predictor with this local optimality satisfies smooth calibration as defined in Kakade-Foster (2008), Błasiok et al. (2023). Local optimality is plausibly satisfied by well-trained DNNs, which suggests an explanation for why they are calibrated from proper loss minimization alone. Finally, we show that the connection between local optimality and calibration error goes both ways: nearly calibrated predictors are also nearly locally optimal. △ Less

Submitted 8 December, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: In NeurIPS 2023. Selected for spotlight presentation

arXiv:2304.09424 [pdf, other]

Loss Minimization Yields Multicalibration for Large Neural Networks

Authors: Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Adam Tauman Kalai, Preetum Nakkiran

Abstract: Multicalibration is a notion of fairness for predictors that requires them to provide calibrated predictions across a large set of protected groups. Multicalibration is known to be a distinct goal than loss minimization, even for simple predictors such as linear functions. In this work, we consider the setting where the protected groups can be represented by neural networks of size $k$, and the… ▽ More Multicalibration is a notion of fairness for predictors that requires them to provide calibrated predictions across a large set of protected groups. Multicalibration is known to be a distinct goal than loss minimization, even for simple predictors such as linear functions. In this work, we consider the setting where the protected groups can be represented by neural networks of size $k$, and the predictors are neural networks of size $n > k$. We show that minimizing the squared loss over all neural nets of size $n$ implies multicalibration for all but a bounded number of unlucky values of $n$. We also give evidence that our bound on the number of unlucky values is tight, given our proof technique. Previously, results of the flavor that loss minimization yields multicalibration were known only for predictors that were near the ground truth, hence were rather limited in applicability. Unlike these, our results rely on the expressivity of neural nets and utilize the representation of the predictor. △ Less

Submitted 7 December, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

Comments: In ITCS 2024

arXiv:2302.06726 [pdf, other]

Swap Agnostic Learning, or Characterizing Omniprediction via Multicalibration

Authors: Parikshit Gopalan, Michael P. Kim, Omer Reingold

Abstract: We introduce and study Swap Agnostic Learning. The problem can be phrased as a game between a predictor and an adversary: first, the predictor selects a hypothesis $h$; then, the adversary plays in response, and for each level set of the predictor $\{x \in \mathcal{X} : h(x) = v\}$ selects a (different) loss-minimizing hypothesis $c_v \in \mathcal{C}$; the predictor wins if $h$ competes with the a… ▽ More We introduce and study Swap Agnostic Learning. The problem can be phrased as a game between a predictor and an adversary: first, the predictor selects a hypothesis $h$; then, the adversary plays in response, and for each level set of the predictor $\{x \in \mathcal{X} : h(x) = v\}$ selects a (different) loss-minimizing hypothesis $c_v \in \mathcal{C}$; the predictor wins if $h$ competes with the adaptive adversary's loss. Despite the strength of the adversary, we demonstrate the feasibility Swap Agnostic Learning for any convex loss. Somewhat surprisingly, the result follows through an investigation into the connections between Omniprediction and Multicalibration. Omniprediction is a new notion of optimality for predictors that strengthtens classical notions such as agnostic learning. It asks for loss minimization guarantees (relative to a hypothesis class) that apply not just for a specific loss function, but for any loss belonging to a rich family of losses. A recent line of work shows that omniprediction is implied by multicalibration and related multi-group fairness notions. This unexpected connection raises the question: is multi-group fairness necessary for omniprediction? Our work gives the first affirmative answer to this question. We establish an equivalence between swap variants of omniprediction and multicalibration and swap agnostic learning. Further, swap multicalibration is essentially equivalent to the standard notion of multicalibration, so existing learning algorithms can be used to achieve any of the three notions. Building on this characterization, we paint a complete picture of the relationship between different variants of multi-group fairness, omniprediction, and Outcome Indistinguishability. This inquiry reveals a unified notion of OI that captures all existing notions of omniprediction and multicalibration. △ Less

Submitted 21 January, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

MSC Class: 68T05; 68Q32

arXiv:2211.16886 [pdf, other]

A Unifying Theory of Distance from Calibration

Authors: Jarosław Błasiok, Parikshit Gopalan, Lunjia Hu, Preetum Nakkiran

Abstract: We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular me… ▽ More We study the fundamental question of how to define and measure the distance from calibration for probabilistic predictors. While the notion of perfect calibration is well-understood, there is no consensus on how to quantify the distance from perfect calibration. Numerous calibration measures have been proposed in the literature, but it is unclear how they compare to each other, and many popular measures such as Expected Calibration Error (ECE) fail to satisfy basic properties like continuity. We present a rigorous framework for analyzing calibration measures, inspired by the literature on property testing. We propose a ground-truth notion of distance from calibration: the $\ell_1$ distance to the nearest perfectly calibrated predictor. We define a consistent calibration measure as one that is polynomially related to this distance. Applying our framework, we identify three calibration measures that are consistent and can be estimated efficiently: smooth calibration, interval calibration, and Laplace kernel calibration. The former two give quadratic approximations to the ground truth distance, which we show is information-theoretically optimal in a natural model for measuring calibration which we term the prediction-only access model. Our work thus establishes fundamental lower and upper bounds on measuring the distance to calibration, and also provides theoretical justification for preferring certain metrics (like Laplace kernel calibration) in practice. △ Less

Submitted 31 March, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: In STOC 2023

arXiv:2210.08649 [pdf, other]

Loss Minimization through the Lens of Outcome Indistinguishability

Authors: Parikshit Gopalan, Lunjia Hu, Michael P. Kim, Omer Reingold, Udi Wieder

Abstract: We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic t… ▽ More We present a new perspective on loss minimization and the recent notion of Omniprediction through the lens of Outcome Indistingusihability. For a collection of losses and hypothesis class, omniprediction requires that a predictor provide a loss-minimization guarantee simultaneously for every loss in the collection compared to the best (loss-specific) hypothesis in the class. We present a generic template to learn predictors satisfying a guarantee we call Loss Outcome Indistinguishability. For a set of statistical tests--based on a collection of losses and hypothesis class--a predictor is Loss OI if it is indistinguishable (according to the tests) from Nature's true probabilities over outcomes. By design, Loss OI implies omniprediction in a direct and intuitive manner. We simplify Loss OI further, decomposing it into a calibration condition plus multiaccuracy for a class of functions derived from the loss and hypothesis classes. By careful analysis of this class, we give efficient constructions of omnipredictors for interesting classes of loss functions, including non-convex losses. This decomposition highlights the utility of a new multi-group fairness notion that we call calibrated multiaccuracy, which lies in between multiaccuracy and multicalibration. We show that calibrated multiaccuracy implies Loss OI for the important set of convex losses arising from Generalized Linear Models, without requiring full multicalibration. For such losses, we show an equivalence between our computational notion of Loss OI and a geometric notion of indistinguishability, formulated as Pythagorean theorems in the associated Bregman divergence. We give an efficient algorithm for calibrated multiaccuracy with computational complexity comparable to that of multiaccuracy. In all, calibrated multiaccuracy offers an interesting tradeoff point between efficiency and generality in the omniprediction landscape. △ Less

Submitted 8 December, 2022; v1 submitted 16 October, 2022; originally announced October 2022.

arXiv:2203.01255 [pdf, other]

Low-Degree Multicalibration

Authors: Parikshit Gopalan, Michael P. Kim, Mihir Singhal, Shengjia Zhao

Abstract: Introduced as a notion of algorithmic fairness, multicalibration has proved to be a powerful and versatile concept with implications far beyond its original intent. This stringent notion -- that predictions be well-calibrated across a rich class of intersecting subpopulations -- provides its strong guarantees at a cost: the computational and sample complexity of learning multicalibrated predictors… ▽ More Introduced as a notion of algorithmic fairness, multicalibration has proved to be a powerful and versatile concept with implications far beyond its original intent. This stringent notion -- that predictions be well-calibrated across a rich class of intersecting subpopulations -- provides its strong guarantees at a cost: the computational and sample complexity of learning multicalibrated predictors are high, and grow exponentially with the number of class labels. In contrast, the relaxed notion of multiaccuracy can be achieved more efficiently, yet many of the most desirable properties of multicalibration cannot be guaranteed assuming multiaccuracy alone. This tension raises a key question: Can we learn predictors with multicalibration-style guarantees at a cost commensurate with multiaccuracy? In this work, we define and initiate the study of Low-Degree Multicalibration. Low-Degree Multicalibration defines a hierarchy of increasingly-powerful multi-group fairness notions that spans multiaccuracy and the original formulation of multicalibration at the extremes. Our main technical contribution demonstrates that key properties of multicalibration, related to fairness and accuracy, actually manifest as low-degree properties. Importantly, we show that low-degree multicalibration can be significantly more efficient than full multicalibration. In the multi-class setting, the sample complexity to achieve low-degree multicalibration improves exponentially (in the number of classes) over full multicalibration. Our work presents compelling evidence that low-degree multicalibration represents a sweet spot, pairing computational and sample efficiency with strong fairness and accuracy guarantees. △ Less

Submitted 16 June, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

Comments: Appears at COLT'22

arXiv:2202.13576 [pdf, other]

KL Divergence Estimation with Multi-group Attribution

Authors: Parikshit Gopalan, Nina Narodytska, Omer Reingold, Vatsal Sharan, Udi Wieder

Abstract: Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) f… ▽ More Estimating the Kullback-Leibler (KL) divergence between two distributions given samples from them is well-studied in machine learning and information theory. Motivated by considerations of multi-group fairness, we seek KL divergence estimates that accurately reflect the contributions of sub-populations to the overall divergence. We model the sub-populations coming from a rich (possibly infinite) family $\mathcal{C}$ of overlapping subsets of the domain. We propose the notion of multi-group attribution for $\mathcal{C}$, which requires that the estimated divergence conditioned on every sub-population in $\mathcal{C}$ satisfies some natural accuracy and fairness desiderata, such as ensuring that sub-populations where the model predicts significant divergence do diverge significantly in the two distributions. Our main technical contribution is to show that multi-group attribution can be derived from the recently introduced notion of multi-calibration for importance weights [HKRR18, GRSW21]. We provide experimental evidence to support our theoretical results, and show that multi-group attribution provides better KL divergence estimates when conditioned on sub-populations than other popular algorithms. △ Less

Submitted 28 February, 2022; originally announced February 2022.

Comments: 20 pages, 4 figures

arXiv:2109.05389 [pdf, other]

Omnipredictors

Authors: Parikshit Gopalan, Adam Tauman Kalai, Omer Reingold, Vatsal Sharan, Udi Wieder

Abstract: Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?''). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be kn… ▽ More Loss minimization is a dominant paradigm in machine learning, where a predictor is trained to minimize some loss function that depends on an uncertain event (e.g., "will it rain tomorrow?''). Different loss functions imply different learning algorithms and, at times, very different predictors. While widespread and appealing, a clear drawback of this approach is that the loss function may not be known at the time of learning, requiring the algorithm to use a best-guess loss function. We suggest a rigorous new paradigm for loss minimization in machine learning where the loss function can be ignored at the time of learning and only be taken into account when deciding an action. We introduce the notion of an (${\mathcal{L}},\mathcal{C}$)-omnipredictor, which could be used to optimize any loss in a family ${\mathcal{L}}$. Once the loss function is set, the outputs of the predictor can be post-processed (a simple univariate data-independent transformation of individual predictions) to do well compared with any hypothesis from the class $\mathcal{C}$. The post processing is essentially what one would perform if the outputs of the predictor were true probabilities of the uncertain events. In a sense, omnipredictors extract all the predictive power from the class $\mathcal{C}$, irrespective of the loss function in $\mathcal{L}$. We show that such "loss-oblivious'' learning is feasible through a connection to multicalibration, a notion introduced in the context of algorithmic fairness. In addition, we show how multicalibration can be viewed as a solution concept for agnostic boosting, shedding new light on past results. Finally, we transfer our insights back to the context of algorithmic fairness by providing omnipredictors for multi-group loss minimization. △ Less

Submitted 11 September, 2021; originally announced September 2021.

Comments: 35 pages, 1 figure

arXiv:2103.05853 [pdf, ps, other]

Multicalibrated Partitions for Importance Weights

Authors: Parikshit Gopalan, Omer Reingold, Vatsal Sharan, Udi Wieder

Abstract: The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergen… ▽ More The ratio between the probability that two distributions $R$ and $P$ give to points $x$ are known as importance weights or propensity scores and play a fundamental role in many different fields, most notably, statistics and machine learning. Among its applications, importance weights are central to domain adaptation, anomaly detection, and estimations of various divergences such as the KL divergence. We consider the common setting where $R$ and $P$ are only given through samples from each distribution. The vast literature on estimating importance weights is either heuristic, or makes strong assumptions about $R$ and $P$ or on the importance weights themselves. In this paper, we explore a computational perspective to the estimation of importance weights, which factors in the limitations and possibilities obtainable with bounded computational resources. We significantly strengthen previous work that use the MaxEntropy approach, that define the importance weights based on a distribution $Q$ closest to $P$, that looks the same as $R$ on every set $C \in \mathcal{C}$, where $\mathcal{C}$ may be a huge collection of sets. We show that the MaxEntropy approach may fail to assign high average scores to sets $C \in \mathcal{C}$, even when the average of ground truth weights for the set is evidently large. We similarly show that it may overestimate the average scores to sets $C \in \mathcal{C}$. We therefore formulate Sandwiching bounds as a notion of set-wise accuracy for importance weights. We study these bounds to show that they capture natural completeness and soundness requirements from the weights. We present an efficient algorithm that under standard learnability assumptions computes weights which satisfy these bounds. Our techniques rely on a new notion of multicalibrated partitions of the domain of the distributions, which appear to be useful objects in their own right. △ Less

Submitted 9 March, 2021; originally announced March 2021.

Comments: 27 pages

arXiv:2006.12018 [pdf, other]

Overlook: Differentially Private Exploratory Visualization for Big Data

Authors: Pratiksha Thaker, Mihai Budiu, Parikshit Gopalan, Udi Wieder, Matei Zaharia

Abstract: Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a s… ▽ More Data exploration systems that provide differential privacy must manage a privacy budget that measures the amount of privacy lost across multiple queries. One effective strategy to manage the privacy budget is to compute a one-time private synopsis of the data, to which users can make an unlimited number of queries. However, existing systems using synopses are built for offline use cases, where a set of queries is known ahead of time and the system carefully optimizes a synopsis for it. The synopses that these systems build are costly to compute and may also be costly to store. We introduce Overlook, a system that enables private data exploration at interactive latencies for both data analysts and data curators. The key idea in Overlook is a virtual synopsis that can be evaluated incrementally, without extra space storage or expensive precomputation. Overlook simply executes queries using an existing engine, such as a SQL DBMS, and adds noise to their results. Because Overlook's synopses do not require costly precomputation or storage, data curators can also use Overlook to explore the impact of privacy parameters interactively. Overlook offers a rich visual query interface based on the open source Hillview system. Overlook achieves accuracy comparable to existing synopsis-based systems, while offering better performance and removing the need for extra storage. △ Less

Submitted 22 June, 2020; originally announced June 2020.

arXiv:1912.03582 [pdf, other]

PIDForest: Anomaly Detection via Partial Identification

Authors: Parikshit Gopalan, Vatsal Sharan, Udi Wieder

Abstract: We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum densit… ▽ More We consider the problem of detecting anomalies in a large dataset. We propose a framework called Partial Identification which captures the intuition that anomalies are easy to distinguish from the overwhelming majority of points by relatively few attribute values. Formalizing this intuition, we propose a geometric anomaly measure for a point that we call PIDScore, which measures the minimum density of data points over all subcubes containing the point. We present PIDForest: a random forest based algorithm that finds anomalies based on this definition. We show that it performs favorably in comparison to several popular anomaly detection methods, across a broad range of benchmarks. PIDForest also provides a succinct explanation for why a point is labelled anomalous, by providing a set of features and ranges for them which are relatively uncommon in the dataset. △ Less

Submitted 7 December, 2019; originally announced December 2019.

arXiv:1911.07378 [pdf, ps, other]

Finding Skewed Subcubes Under a Distribution

Authors: Parikshit Gopalan, Roie Levin, Udi Wieder

Abstract: Say that we are given samples from a distribution $ψ$ over an $n$-dimensional space. We expect or desire $ψ$ to behave like a product distribution (or a $k$-wise independent distribution over its marginals for small $k$). We propose the problem of enumerating/list-decoding all large subcubes where the distribution $ψ$ deviates markedly from what we expect; we refer to such subcubes as skewed subcu… ▽ More Say that we are given samples from a distribution $ψ$ over an $n$-dimensional space. We expect or desire $ψ$ to behave like a product distribution (or a $k$-wise independent distribution over its marginals for small $k$). We propose the problem of enumerating/list-decoding all large subcubes where the distribution $ψ$ deviates markedly from what we expect; we refer to such subcubes as skewed subcubes. Skewed subcubes are certificates of dependencies between small subsets of variables in $ψ$. We motivate this problem by showing that it arises naturally in the context of algorithmic fairness and anomaly detection. In this work we focus on the special but important case where the space is the Boolean hypercube, and the expected marginals are uniform. We show that the obvious definition of skewed subcubes can lead to intractable list sizes, and propose a better definition of a minimal skewed subcube, which are subcubes whose skew cannot be attributed to a larger subcube that contains it. Our main technical contribution is a list-size bound for this definition and an algorithm to efficiently find all such subcubes. Both the bound and the algorithm rely on Fourier-analytic techniques, especially the powerful hypercontractive inequality. On the lower bounds side, we show that finding skewed subcubes is as hard as the sparse noisy parity problem, and hence our algorithms cannot be improved on substantially without a breakthrough on this problem which is believed to be intractable. Motivated by this, we study alternate models allowing query access to $ψ$ where finding skewed subcubes might be easier. △ Less

Submitted 12 November, 2020; v1 submitted 17 November, 2019; originally announced November 2019.

arXiv:1907.04827 [pdf, other]

doi 10.14778/3342263.3342279

Hillview: A trillion-cell spreadsheet for big data

Authors: Mihai Budiu, Parikshit Gopalan, Lalith Suresh, Udi Wieder, Han Kruiger, Marcos K. Aguilera

Abstract: Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketche… ▽ More Hillview is a distributed spreadsheet for browsing very large datasets that cannot be handled by a single machine. As a spreadsheet, Hillview provides a high degree of interactivity that permits data analysts to explore information quickly along many dimensions while switching visualizations on a whim. To provide the required responsiveness, Hillview introduces visualization sketches, or vizketches, as a simple idea to produce compact data visualizations. Vizketches combine algorithmic techniques for data summarization with computer graphics principles for efficient rendering. While simple, vizketches are effective at scaling the spreadsheet by parallelizing computation, reducing communication, providing progressive visualizations, and offering precise accuracy guarantees. Using Hillview running on eight servers, we can navigate and visualize datasets of tens of billions of rows and trillions of cells, much beyond the published capabilities of competing systems. △ Less

Submitted 10 July, 2019; originally announced July 2019.

arXiv:1804.03065 [pdf, other]

Efficient Anomaly Detection via Matrix Sketching

Authors: Vatsal Sharan, Parikshit Gopalan, Udi Wieder

Abstract: We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms that use space that is linear or sublinear in the dimension. We prove general results sho… ▽ More We consider the problem of finding anomalies in high-dimensional data using popular PCA based anomaly scores. The naive algorithms for computing these scores explicitly compute the PCA of the covariance matrix which uses space quadratic in the dimensionality of the data. We give the first streaming algorithms that use space that is linear or sublinear in the dimension. We prove general results showing that \emph{any} sketch of a matrix that satisfies a certain operator norm guarantee can be used to approximate these scores. We instantiate these results with powerful matrix sketching techniques such as Frequent Directions and random projections to derive efficient and practical algorithms for these problems, which we validate over real-world data sets. Our main technical contribution is to prove matrix perturbation inequalities for operators arising in the computation of these measures. △ Less

Submitted 27 November, 2018; v1 submitted 9 April, 2018; originally announced April 2018.

Comments: Updates for NeurIPS'18 camera-ready

arXiv:1803.03620 [pdf, other]

Stable and Consistent Membership at Scale with Rapid

Authors: Lalith Suresh, Dahlia Malkhi, Parikshit Gopalan, Ivan Porto Carreiro, Zeeshan Lokhandwala

Abstract: We present the design and evaluation of Rapid, a distributed membership service. At Rapid's core is a scheme for multi-process cut detection (CD) that revolves around two key insights: (i) it suspects a failure of a process only after alerts arrive from multiple sources, and (ii) when a group of processes experience problems, it detects failures of the entire group, rather than conclude about each… ▽ More We present the design and evaluation of Rapid, a distributed membership service. At Rapid's core is a scheme for multi-process cut detection (CD) that revolves around two key insights: (i) it suspects a failure of a process only after alerts arrive from multiple sources, and (ii) when a group of processes experience problems, it detects failures of the entire group, rather than conclude about each process individually. Implementing these insights translates into a simple membership algorithm with low communication overhead. We present evidence that our strategy suffices to drive unanimous detection almost-everywhere, even when complex network conditions arise, such as one-way reachability problems, firewall misconfigurations, and high packet loss. Furthermore, we present both empirical evidence and analyses that proves that the almost-everywhere detection happens with high probability. To complete the design, Rapid contains a leaderless consensus protocol that converts multi-process cut detections into a view-change decision. The resulting membership service works both in fully decentralized as well as logically centralized modes. We present an evaluation of Rapid in moderately scalable cloud settings. Rapid bootstraps 2000 node clusters 2-5.8x faster than prevailing tools such as Memberlist and ZooKeeper, remains stable in face of complex failure scenarios, and provides strong consistency guarantees. It is easy to integrate Rapid into existing distributed applications, of which we demonstrate two. △ Less

Submitted 9 March, 2018; originally announced March 2018.

Comments: 15 pages

arXiv:1605.05412 [pdf, other]

Maximally Recoverable Codes for Grid-like Topologies

Authors: Parikshit Gopalan, Guangda Hu, Swastik Kopparty, Shubhangi Saraf, Carol Wang, Sergey Yekhanin

Abstract: The explosion in the volumes of data being stored online has resulted in distributed storage systems transitioning to erasure coding based schemes. Yet, the codes being deployed in practice are fairly short. In this work, we address what we view as the main coding theoretic barrier to deploying longer codes in storage: at large lengths, failures are not independent and correlated failures are inev… ▽ More The explosion in the volumes of data being stored online has resulted in distributed storage systems transitioning to erasure coding based schemes. Yet, the codes being deployed in practice are fairly short. In this work, we address what we view as the main coding theoretic barrier to deploying longer codes in storage: at large lengths, failures are not independent and correlated failures are inevitable. This motivates designing codes that allow quick data recovery even after large correlated failures, and which have efficient encoding and decoding. We propose that code design for distributed storage be viewed as a two-step process. The first step is choose a topology of the code, which incorporates knowledge about the correlated failures that need to be handled, and ensures local recovery from such failures. In the second step one specifies a code with the chosen topology by choosing coefficients from a finite field. In this step, one tries to balance reliability (which is better over larger fields) with encoding and decoding efficiency (which is better over smaller fields). This work initiates an in-depth study of this reliability/efficiency tradeoff. We consider the field-size needed for achieving maximal recoverability: the strongest reliability possible with a given topology. We propose a family of topologies called grid-like topologies which unify a number of topologies considered both in theory and practice, and prove a collection of results about maximally recoverable codes in such topologies including the first super-polynomial lower bound on the field size. △ Less

Submitted 20 September, 2016; v1 submitted 17 May, 2016; originally announced May 2016.

arXiv:1604.07432 [pdf, ps, other]

Degree and Sensitivity: tails of two distributions

Authors: Parikshit Gopalan, Rocco Servedio, Avishay Tal, Avi Wigderson

Abstract: The sensitivity of a Boolean function f is the maximum over all inputs x, of the number of sensitive coordinates of x. The well-known sensitivity conjecture of Nisan (see also Nisan and Szegedy) states that every sensitivity-s Boolean function can be computed by a polynomial over the reals of degree poly(s). The best known upper bounds on degree, however, are exponential rather than polynomial in… ▽ More The sensitivity of a Boolean function f is the maximum over all inputs x, of the number of sensitive coordinates of x. The well-known sensitivity conjecture of Nisan (see also Nisan and Szegedy) states that every sensitivity-s Boolean function can be computed by a polynomial over the reals of degree poly(s). The best known upper bounds on degree, however, are exponential rather than polynomial in s. Our main result is an approximate version of the conjecture: every Boolean function with sensitivity s can be epsilon-approximated (in L_2) by a polynomial whose degree is O(s log(1/epsilon)). This is the first improvement on the folklore bound of s/epsilon. Further, we show that improving the bound to O(s^c log(1/epsilon)^d)$ for any d < 1 and any c > 0 will imply the sensitivity conjecture. Thus our result is essentially the best one can hope for without proving the conjecture. We postulate a robust analogue of the sensitivity conjecture: if most inputs to a Boolean function f have low sensitivity, then most of the Fourier mass of f is concentrated on small subsets, and present an approach towards proving this conjecture. △ Less

Submitted 25 April, 2016; originally announced April 2016.

Comments: The conference version of this paper will appear in CCC'2016

MSC Class: 68Q10; 68Q15

arXiv:1508.02420 [pdf, ps, other]

Smooth Boolean functions are easy: efficient algorithms for low-sensitivity functions

Authors: Parikshit Gopalan, Noam Nisan, Rocco A. Servedio, Kunal Talwar, Avi Wigderson

Abstract: A natural measure of smoothness of a Boolean function is its sensitivity (the largest number of Hamming neighbors of a point which differ from it in function value). The structure of smooth or equivalently low-sensitivity functions is still a mystery. A well-known conjecture states that every such Boolean function can be computed by a shallow decision tree. While this conjecture implies that smoot… ▽ More A natural measure of smoothness of a Boolean function is its sensitivity (the largest number of Hamming neighbors of a point which differ from it in function value). The structure of smooth or equivalently low-sensitivity functions is still a mystery. A well-known conjecture states that every such Boolean function can be computed by a shallow decision tree. While this conjecture implies that smooth functions are easy to compute in the simplest computational model, to date no non-trivial upper bounds were known for such functions in any computational model, including unrestricted Boolean circuits. Even a bound on the description length of such functions better than the trivial $2^n$ does not seem to have been known. In this work, we establish the first computational upper bounds on smooth Boolean functions: 1) We show that every sensitivity s function is uniquely specified by its values on a Hamming ball of radius 2s. We use this to show that such functions can be computed by circuits of size $n^{O(s)}$. 2) We show that sensitivity s functions satisfy a strong pointwise noise-stability guarantee for random noise of rate O(1/s). We use this to show that these functions have formulas of depth O(s log n). 3) We show that sensitivity s functions can be (locally) self-corrected from worst-case noise of rate $\exp(-O(s))$. All our results are simple, and follow rather directly from (variants of) the basic fact that that the function value at few points in small neighborhoods of a given point determine its function value via a majority vote. Our results confirm various consequences of the conjecture. They may be viewed as providing a new form of evidence towards its validity, as well as new directions towards attacking it. △ Less

Submitted 10 August, 2015; originally announced August 2015.

MSC Class: 68Q15; 68Q17

arXiv:1506.04350 [pdf, ps, other]

Pseudorandomness via the discrete Fourier transform

Authors: Parikshit Gopalan, Daniel Kane, Raghu Meka

Abstract: We present a new approach to constructing unconditional pseudorandom generators against classes of functions that involve computing a linear function of the inputs. We give an explicit construction of a pseudorandom generator that fools the discrete Fourier transforms of linear functions with seed-length that is nearly logarithmic (up to polyloglog factors) in the input size and the desired error… ▽ More We present a new approach to constructing unconditional pseudorandom generators against classes of functions that involve computing a linear function of the inputs. We give an explicit construction of a pseudorandom generator that fools the discrete Fourier transforms of linear functions with seed-length that is nearly logarithmic (up to polyloglog factors) in the input size and the desired error parameter. Our result gives a single pseudorandom generator that fools several important classes of tests computable in logspace that have been considered in the literature, including halfspaces (over general domains), modular tests and combinatorial shapes. For all these classes, our generator is the first that achieves near logarithmic seed-length in both the input length and the error parameter. Getting such a seed-length is a natural challenge in its own right, which needs to be overcome in order to derandomize RL - a central question in complexity theory. Our construction combines ideas from a large body of prior work, ranging from a classical construction of [NN93] to the recent gradually increasing independence paradigm of [KMN11, CRSW13, GMRTV12], while also introducing some novel analytic machinery which might find other applications. △ Less

Submitted 18 November, 2015; v1 submitted 14 June, 2015; originally announced June 2015.

arXiv:1504.07687 [pdf, ps, other]

Public projects, Boolean functions and the borders of Border's theorem

Authors: Parikshit Gopalan, Noam Nisan, Tim Roughgarden

Abstract: Border's theorem gives an intuitive linear characterization of the feasible interim allocation rules of a Bayesian single-item environment, and it has several applications in economic and algorithmic mechanism design. All known generalizations of Border's theorem either restrict attention to relatively simple settings, or resort to approximation. This paper identifies a complexity-theoretic barrie… ▽ More Border's theorem gives an intuitive linear characterization of the feasible interim allocation rules of a Bayesian single-item environment, and it has several applications in economic and algorithmic mechanism design. All known generalizations of Border's theorem either restrict attention to relatively simple settings, or resort to approximation. This paper identifies a complexity-theoretic barrier that indicates, assuming standard complexity class separations, that Border's theorem cannot be extended significantly beyond the state-of-the-art. We also identify a surprisingly tight connection between Myerson's optimal auction theory, when applied to public project settings, and some fundamental results in the analysis of Boolean functions. △ Less

Submitted 28 April, 2015; originally announced April 2015.

Comments: Accepted to ACM EC 2015

MSC Class: 68Q17; 68Q25

arXiv:1411.4584 [pdf, ps, other]

Pseudorandomness for concentration bounds and signed majorities

Authors: Parikshit Gopalan, Daniel Kane, Raghu Meka

Abstract: The problem of constructing pseudorandom generators that fool halfspaces has been studied intensively in recent times. For fooling halfspaces over the hypercube with polynomially small error, the best construction known requires seed-length O(log^2 n) (MekaZ13). Getting the seed-length down to O(log(n)) is a natural challenge in its own right, which needs to be overcome in order to derandomize RL.… ▽ More The problem of constructing pseudorandom generators that fool halfspaces has been studied intensively in recent times. For fooling halfspaces over the hypercube with polynomially small error, the best construction known requires seed-length O(log^2 n) (MekaZ13). Getting the seed-length down to O(log(n)) is a natural challenge in its own right, which needs to be overcome in order to derandomize RL. In this work we make progress towards this goal by obtaining near-optimal generators for two important special cases: 1) We give a near optimal derandomization of the Chernoff bound for independent, uniformly random bits. Specifically, we show how to generate a x in {1,-1}^n using $\tilde{O}(\log (n/ε))$ random bits such that for any unit vector u, <u,x> matches the sub-Gaussian tail behaviour predicted by the Chernoff bound up to error eps. 2) We construct a generator which fools halfspaces with {0,1,-1} coefficients with error eps with a seed-length of $\tilde{O}(\log(n/ε))$. This includes the important special case of majorities. In both cases, the best previous results required seed-length of $O(\log n + \log^2(1/ε))$. Technically, our work combines new Fourier-analytic tools with the iterative dimension reduction techniques and the gradually increasing independence paradigm of previous works (KaneMN11, CelisRSW13, GopalanMRTV12). △ Less

Submitted 17 November, 2014; originally announced November 2014.

arXiv:1402.3543 [pdf, ps, other]

Inequalities and tail bounds for elementary symmetric polynomial with applications

Authors: Parikshit Gopalan, Amir Yehudayoff

Abstract: We study the extent of independence needed to approximate the product of bounded random variables in expectation, a natural question that has applications in pseudorandomness and min-wise independent hashing. For random variables whose absolute value is bounded by $1$, we give an error bound of the form $σ^{Ω(k)}$ where $k$ is the amount of independence and $σ^2$ is the total variance of the sum… ▽ More We study the extent of independence needed to approximate the product of bounded random variables in expectation, a natural question that has applications in pseudorandomness and min-wise independent hashing. For random variables whose absolute value is bounded by $1$, we give an error bound of the form $σ^{Ω(k)}$ where $k$ is the amount of independence and $σ^2$ is the total variance of the sum. Previously known bounds only applied in more restricted settings, and were quanitively weaker. We use this to give a simpler and more modular analysis of a construction of min-wise independent hash functions and pseudorandom generators for combinatorial rectangles due to Gopalan et al., which also slightly improves their seed-length. Our proof relies on a new analytic inequality for the elementary symmetric polynomials $S_k(x)$ for $x \in \mathbb{R}^n$ which we believe to be of independent interest. We show that if $|S_k(x)|,|S_{k+1}(x)|$ are small relative to $|S_{k-1}(x)|$ for some $k>0$ then $|S_\ell(x)|$ is also small for all $\ell > k$. From these, we derive tail bounds for the elementary symmetric polynomials when the inputs are only $k$-wise independent. △ Less

Submitted 10 August, 2015; v1 submitted 14 February, 2014; originally announced February 2014.

MSC Class: 68Q87; 68W20

arXiv:1311.1704 [pdf, other]

Scalable Recommendation with Poisson Factorization

Authors: Prem Gopalan, Jake M. Hofman, David M. Blei

Abstract: We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., through views or purchases). In contrast to traditional matrix factorization approaches, Poisson factori… ▽ More We develop a Bayesian Poisson matrix factorization model for forming recommendations from sparse user behavior data. These data are large user/item matrices where each user has provided feedback on only a small subset of items, either explicitly (e.g., through star ratings) or implicitly (e.g., through views or purchases). In contrast to traditional matrix factorization approaches, Poisson factorization implicitly models each user's limited attention to consume items. Moreover, because of the mathematical form of the Poisson likelihood, the model needs only to explicitly consider the observed entries in the matrix, leading to both scalable computation and good predictive performance. We develop a variational inference algorithm for approximate posterior inference that scales up to massive data sets. This is an efficient algorithm that iterates over the observed entries and adjusts an approximate posterior over the user/item representations. We apply our method to large real-world user data containing users rating movies, users listening to songs, and users reading scientific papers. In all these settings, Bayesian Poisson factorization outperforms state-of-the-art matrix factorization methods. △ Less

Submitted 20 May, 2014; v1 submitted 7 November, 2013; originally announced November 2013.

arXiv:1308.5158 [pdf, ps, other]

Locally Testable Codes and Cayley Graphs

Authors: Parikshit Gopalan, Salil Vadhan, Yuan Zhou

Abstract: We give two new characterizations of ($\F_2$-linear) locally testable error-correcting codes in terms of Cayley graphs over $\F_2^h$: \begin{enumerate} \item A locally testable code is equivalent to a Cayley graph over $\F_2^h$ whose set of generators is significantly larger than $h$ and has no short linear dependencies, but yields a shortest-path metric that embeds into $\ell_1$ with constant d… ▽ More We give two new characterizations of ($\F_2$-linear) locally testable error-correcting codes in terms of Cayley graphs over $\F_2^h$: \begin{enumerate} \item A locally testable code is equivalent to a Cayley graph over $\F_2^h$ whose set of generators is significantly larger than $h$ and has no short linear dependencies, but yields a shortest-path metric that embeds into $\ell_1$ with constant distortion. This extends and gives a converse to a result of Khot and Naor (2006), which showed that codes with large dual distance imply Cayley graphs that have no low-distortion embeddings into $\ell_1$. \item A locally testable code is equivalent to a Cayley graph over $\F_2^h$ that has significantly more than $h$ eigenvalues near 1, which have no short linear dependencies among them and which "explain" all of the large eigenvalues. This extends and gives a converse to a recent construction of Barak et al. (2012), which showed that locally testable codes imply Cayley graphs that are small-set expanders but have many large eigenvalues. \end{enumerate} △ Less

Submitted 23 August, 2013; originally announced August 2013.

Comments: 22 pages

arXiv:1307.4150 [pdf, ps, other]

Explicit Maximally Recoverable Codes with Locality

Authors: Parikshit Gopalan, Cheng Huang, Bob Jenkins, Sergey Yekhanin

Abstract: Consider a systematic linear code where some (local) parity symbols depend on few prescribed symbols, while other (heavy) parity symbols may depend on all data symbols. Local parities allow to quickly recover any single symbol when it is erased, while heavy parities provide tolerance to a large number of simultaneous erasures. A code as above is maximally-recoverable if it corrects all erasure pat… ▽ More Consider a systematic linear code where some (local) parity symbols depend on few prescribed symbols, while other (heavy) parity symbols may depend on all data symbols. Local parities allow to quickly recover any single symbol when it is erased, while heavy parities provide tolerance to a large number of simultaneous erasures. A code as above is maximally-recoverable if it corrects all erasure patterns which are information theoretically recoverable given the code topology. In this paper we present explicit families of maximally-recoverable codes with locality. We also initiate the study of the trade-off between maximal recoverability and alphabet size. △ Less

Submitted 19 July, 2013; v1 submitted 15 July, 2013; originally announced July 2013.

MSC Class: 94B05

arXiv:1210.0049 [pdf, ps, other]

Better Pseudorandom Generators from Milder Pseudorandom Restrictions

Authors: Parikshit Gopalan, Raghu Meka, Omer Reingold, Luca Trevisan, Salil Vadhan

Abstract: We present an iterative approach to constructing pseudorandom generators, based on the repeated application of mild pseudorandom restrictions. We use this template to construct pseudorandom generators for combinatorial rectangles and read-once CNFs and a hitting set generator for width-3 branching programs, all of which achieve near-optimal seed-length even in the low-error regime: We get seed-len… ▽ More We present an iterative approach to constructing pseudorandom generators, based on the repeated application of mild pseudorandom restrictions. We use this template to construct pseudorandom generators for combinatorial rectangles and read-once CNFs and a hitting set generator for width-3 branching programs, all of which achieve near-optimal seed-length even in the low-error regime: We get seed-length O(log (n/epsilon)) for error epsilon. Previously, only constructions with seed-length O(\log^{3/2} n) or O(\log^2 n) were known for these classes with polynomially small error. The (pseudo)random restrictions we use are milder than those typically used for proving circuit lower bounds in that we only set a constant fraction of the bits at a time. While such restrictions do not simplify the functions drastically, we show that they can be derandomized using small-bias spaces. △ Less

Submitted 28 September, 2012; originally announced October 2012.

Comments: To appear in FOCS 2012

MSC Class: 68Q17

arXiv:1111.0405 [pdf, ps, other]

Making the long code shorter, with applications to the Unique Games Conjecture

Authors: Boaz Barak, Parikshit Gopalan, Johan Hastad, Raghu Meka, Prasad Raghavendra, David Steurer

Abstract: The long code is a central tool in hardness of approximation, especially in questions related to the unique games conjecture. We construct a new code that is exponentially more efficient, but can still be used in many of these applications. Using the new code we obtain exponential improvements over several known results, including the following: 1. For any eps > 0, we show the existence of an n… ▽ More The long code is a central tool in hardness of approximation, especially in questions related to the unique games conjecture. We construct a new code that is exponentially more efficient, but can still be used in many of these applications. Using the new code we obtain exponential improvements over several known results, including the following: 1. For any eps > 0, we show the existence of an n vertex graph G where every set of o(n) vertices has expansion 1 - eps, but G's adjacency matrix has more than exp(log^delta n) eigenvalues larger than 1 - eps, where delta depends only on eps. This answers an open question of Arora, Barak and Steurer (FOCS 2010) who asked whether one can improve over the noise graph on the Boolean hypercube that has poly(log n) such eigenvalues. 2. A gadget that reduces unique games instances with linear constraints modulo K into instances with alphabet k with a blowup of K^polylog(K), improving over the previously known gadget with blowup of 2^K. 3. An n variable integrality gap for Unique Games that that survives exp(poly(log log n)) rounds of the SDP + Sherali Adams hierarchy, improving on the previously known bound of poly(log log n). We show a connection between the local testability of linear codes and small set expansion in certain related Cayley graphs, and use this connection to derandomize the noise graph on the Boolean hypercube. △ Less

Submitted 2 November, 2011; originally announced November 2011.

Comments: 45 pages

MSC Class: 68Q15

arXiv:1106.3625 [pdf, ps, other]

On the Locality of Codeword Symbols

Authors: Parikshit Gopalan, Cheng Huang, Huseyin Simitci, Sergey Yekhanin

Abstract: Consider a linear [n,k,d]_q code C. We say that that i-th coordinate of C has locality r, if the value at this coordinate can be recovered from accessing some other r coordinates of C. Data storage applications require codes with small redundancy, low locality for information coordinates, large distance, and low locality for parity coordinates. In this paper we carry out an in-depth study of the r… ▽ More Consider a linear [n,k,d]_q code C. We say that that i-th coordinate of C has locality r, if the value at this coordinate can be recovered from accessing some other r coordinates of C. Data storage applications require codes with small redundancy, low locality for information coordinates, large distance, and low locality for parity coordinates. In this paper we carry out an in-depth study of the relations between these parameters. We establish a tight bound for the redundancy n-k in terms of the message length, the distance, and the locality of information coordinates. We refer to codes attaining the bound as optimal. We prove some structure theorems about optimal codes, which are particularly strong for small distances. This gives a fairly complete picture of the tradeoffs between codewords length, worst-case distance and locality of information symbols. We then consider the locality of parity check symbols and erasure correction beyond worst case distance for optimal codes. Using our structure theorem, we obtain a tight bound for the locality of parity symbols possible in such codes for a broad class of parameter settings. We prove that there is a tradeoff between having good locality for parity checks and the ability to correct erasures beyond the minimum distance. △ Less

Submitted 18 June, 2011; originally announced June 2011.

arXiv:1008.3187 [pdf, ps, other]

Polynomial-Time Approximation Schemes for Knapsack and Related Counting Problems using Branching Programs

Authors: Parikshit Gopalan, Adam Klivans, Raghu Meka

Abstract: We give a deterministic, polynomial-time algorithm for approximately counting the number of {0,1}-solutions to any instance of the knapsack problem. On an instance of length n with total weight W and accuracy parameter eps, our algorithm produces a (1 + eps)-multiplicative approximation in time poly(n,log W,1/eps). We also give algorithms with identical guarantees for general integer knapsack, the… ▽ More We give a deterministic, polynomial-time algorithm for approximately counting the number of {0,1}-solutions to any instance of the knapsack problem. On an instance of length n with total weight W and accuracy parameter eps, our algorithm produces a (1 + eps)-multiplicative approximation in time poly(n,log W,1/eps). We also give algorithms with identical guarantees for general integer knapsack, the multidimensional knapsack problem (with a constant number of constraints) and for contingency tables (with a constant number of rows). Previously, only randomized approximation schemes were known for these problems due to work by Morris and Sinclair and work by Dyer. Our algorithms work by constructing small-width, read-once branching programs for approximating the underlying solution space under a carefully chosen distribution. As a byproduct of this approach, we obtain new query algorithms for learning functions of k halfspaces with respect to the uniform distribution on {0,1}^n. The running time of our algorithm is polynomial in the accuracy parameter eps. Previously even for the case of k=2, only algorithms with an exponential dependence on eps were known. △ Less

Submitted 18 August, 2010; originally announced August 2010.

arXiv:1001.1593 [pdf, ps, other]

Fooling functions of halfspaces under product distributions

Authors: P. Gopalan, R. O'Donnell, Y. Wu, D. Zuckerman

Abstract: We construct pseudorandom generators that fool functions of halfspaces (threshold functions) under a very broad class of product distributions. This class includes not only familiar cases such as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions with probab… ▽ More We construct pseudorandom generators that fool functions of halfspaces (threshold functions) under a very broad class of product distributions. This class includes not only familiar cases such as the uniform distribution on the discrete cube, the uniform distribution on the solid cube, and the multivariate Gaussian distribution, but also includes any product of discrete distributions with probabilities bounded away from 0. Our first main result shows that a recent pseudorandom generator construction of Meka and Zuckerman [MZ09], when suitably modifed, can fool arbitrary functions of d halfspaces under product distributions where each coordinate has bounded fourth moment. To eps-fool any size-s, depth-d decision tree of halfspaces, our pseudorandom generator uses seed length O((d log(ds/eps)+log n) log(ds/eps)). For monotone functions of d halfspaces, the seed length can be improved to O((d log(d/eps)+log n) log(d/eps)). We get better bounds for larger eps; for example, to 1/polylog(n)-fool all monotone functions of (log n)= log log n halfspaces, our generator requires a seed of length just O(log n). Our second main result generalizes the work of Diakonikolas et al. [DGJ+09] to show that bounded independence suffices to fool functions of halfspaces under product distributions. Assuming each coordinate satisfies a certain stronger moment condition, we show that any function computable by a size-s, depth-d decision tree of halfspaces is eps-fooled by O(d^4s^2/eps^2)-wise independence. △ Less

Submitted 11 January, 2010; originally announced January 2010.

arXiv:0902.3757 [pdf, ps, other]

Bounded Independence Fools Halfspaces

Authors: Ilias Diakonikolas, Parikshit Gopalan, Ragesh Jaiswal, Rocco Servedio, Emanuele Viola

Abstract: We show that any distribution on {-1,1}^n that is k-wise independent fools any halfspace h with error \eps for k = O(\log^2(1/\eps) /\eps^2). Up to logarithmic factors, our result matches a lower bound by Benjamini, Gurel-Gurevich, and Peled (2007) showing that k = Ω(1/(\eps^2 \cdot \log(1/\eps))). Using standard constructions of k-wise independent distributions, we obtain the first explicit pse… ▽ More We show that any distribution on {-1,1}^n that is k-wise independent fools any halfspace h with error \eps for k = O(\log^2(1/\eps) /\eps^2). Up to logarithmic factors, our result matches a lower bound by Benjamini, Gurel-Gurevich, and Peled (2007) showing that k = Ω(1/(\eps^2 \cdot \log(1/\eps))). Using standard constructions of k-wise independent distributions, we obtain the first explicit pseudorandom generators G: {-1,1}^s --> {-1,1}^n that fool halfspaces. Specifically, we fool halfspaces with error eps and seed length s = k \log n = O(\log n \cdot \log^2(1/\eps) /\eps^2). Our approach combines classical tools from real approximation theory with structural results on halfspaces by Servedio (Computational Complexity 2007). △ Less

Submitted 21 February, 2009; originally announced February 2009.

arXiv:0811.4395 [pdf, ps, other]

List Decoding Tensor Products and Interleaved Codes

Authors: Parikshit Gopalan, Venkatesan Guruswami, Prasad Raghavendra

Abstract: We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. We show that for {\em every} code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rather than squaring, as one might expect). This gives the first efficient list decoders and new combinator… ▽ More We design the first efficient algorithms and prove new combinatorial bounds for list decoding tensor products of codes and interleaved codes. We show that for {\em every} code, the ratio of its list decoding radius to its minimum distance stays unchanged under the tensor product operation (rather than squaring, as one might expect). This gives the first efficient list decoders and new combinatorial bounds for some natural codes including multivariate polynomials where the degree in each variable is bounded. We show that for {\em every} code, its list decoding radius remains unchanged under $m$-wise interleaving for an integer $m$. This generalizes a recent result of Dinur et al \cite{DGKS}, who proved such a result for interleaved Hadamard codes (equivalently, linear transformations). Using the notion of generalized Hamming weights, we give better list size bounds for {\em both} tensoring and interleaving of binary linear codes. By analyzing the weight distribution of these codes, we reduce the task of bounding the list size to bounding the number of close-by low-rank codewords. For decoding linear transformations, using rank-reduction together with other ideas, we obtain list size bounds that are tight over small fields. △ Less

Submitted 26 November, 2008; originally announced November 2008.

Comments: 32 pages

ACM Class: E.4; F.2.2

arXiv:cs/0609072 [pdf, ps, other]

The Connectivity of Boolean Satisfiability: Computational and Structural Dichotomies

Authors: Parikshit Gopalan, Phokion G. Kolaitis, Elitza Maneva, Christos H. Papadimitriou

Abstract: Boolean satisfiability problems are an important benchmark for questions about complexity, algorithms, heuristics and threshold phenomena. Recent work on heuristics, and the satisfiability threshold has centered around the structure and connectivity of the solution space. Motivated by this work, we study structural and connectivity-related properties of the space of solutions of Boolean satisfia… ▽ More Boolean satisfiability problems are an important benchmark for questions about complexity, algorithms, heuristics and threshold phenomena. Recent work on heuristics, and the satisfiability threshold has centered around the structure and connectivity of the solution space. Motivated by this work, we study structural and connectivity-related properties of the space of solutions of Boolean satisfiability problems and establish various dichotomies in Schaefer's framework. On the structural side, we obtain dichotomies for the kinds of subgraphs of the hypercube that can be induced by the solutions of Boolean formulas, as well as for the diameter of the connected components of the solution space. On the computational side, we establish dichotomy theorems for the complexity of the connectivity and st-connectivity questions for the graph of solutions of Boolean formulas. Our results assert that the intractable side of the computational dichotomies is PSPACE-complete, while the tractable side - which includes but is not limited to all problems with polynomial time algorithms for satisfiability - is in P for the st-connectivity question, and in coNP for the connectivity question. The diameter of components can be exponential for the PSPACE-complete cases, whereas in all other cases it is linear; thus, small diameter and tractability of the connectivity problems are remarkably aligned. The crux of our results is an expressibility theorem showing that in the tractable cases, the subgraphs induced by the solution space possess certain good structural properties, whereas in the intractable cases, the subgraphs can be arbitrary. △ Less

Submitted 3 October, 2007; v1 submitted 13 September, 2006; originally announced September 2006.

Journal ref: Extended abstract in Proceedings of ICALP 2006, pp 346-357

arXiv:cs/0106055 [pdf, ps]

A Seamless Integration of Association Rule Mining with Database Systems

Authors: Raj P. Gopalan, Tariq Nuruddin, Yudho Giri Sucahyo

Abstract: The need for Knowledge and Data Discovery Management Systems (KDDMS) that support ad hoc data mining queries has been long recognized. A significant amount of research has gone into building tightly coupled systems that integrate association rule mining with database systems. In this paper, we describe a seamless integration scheme for database queries and association rule discovery using a comm… ▽ More The need for Knowledge and Data Discovery Management Systems (KDDMS) that support ad hoc data mining queries has been long recognized. A significant amount of research has gone into building tightly coupled systems that integrate association rule mining with database systems. In this paper, we describe a seamless integration scheme for database queries and association rule discovery using a common query optimizer for both. Query trees of expressions in an extended algebra are used for internal representation in the optimizer. The algebraic representation is flexible enough to deal with constrained association rule queries and other variations of association rule specifications. We propose modularization to simplify the query tree for complex tasks in data mining. It paves the way for making use of existing algorithms for constructing query plans in the optimization process. How the integration scheme we present will facilitate greater user control over the data mining process is also discussed. The work described in this paper forms part of a larger project for fully integrating data mining with database management. △ Less

Submitted 28 June, 2001; originally announced June 2001.

Comments: 15 pages

ACM Class: H.2.8

Showing 1–37 of 37 results for author: Gopalan, P