Search | arXiv e-print repository

How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad

Authors: Emmanuel Abbe, Samy Bengio, Aryo Lotfi, Colin Sandon, Omid Saremi

Abstract: Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This paper puts forward the notion of 'distribution locality' to capture when weak learning is efficiently ac… ▽ More Can Transformers predict new syllogisms by composing established ones? More generally, what type of targets can be learned by such models from scratch? Recent works show that Transformers can be Turing-complete in terms of expressivity, but this does not address the learnability objective. This paper puts forward the notion of 'distribution locality' to capture when weak learning is efficiently achievable by regular Transformers, where the locality measures the least number of tokens required in addition to the tokens histogram to correlate nontrivially with the target. As shown experimentally and theoretically under additional assumptions, distributions with high locality cannot be learned efficiently. In particular, syllogisms cannot be composed on long chains. Furthermore, we show that (i) an agnostic scratchpad cannot help to break the locality barrier, (ii) an educated scratchpad can help if it breaks the locality at each step, (iii) a notion of 'inductive scratchpad' can both break the locality and improve the out-of-distribution generalization, e.g., generalizing to almost double input size for some arithmetic tasks. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 38 pages, 11 figures

arXiv:2312.04329 [pdf, other]

Reed-Muller codes have vanishing bit-error probability below capacity: a simple tighter proof via camellia boosting

Authors: Emmanuel Abbe, Colin Sandon

Abstract: This paper shows that a class of codes such as Reed-Muller (RM) codes have vanishing bit-error probability below capacity on symmetric channels. The proof relies on the notion of `camellia codes': a class of symmetric codes decomposable into `camellias', i.e., set systems that differ from sunflowers by allowing for scattered petal overlaps. The proof then follows from a boosting argument on the ca… ▽ More This paper shows that a class of codes such as Reed-Muller (RM) codes have vanishing bit-error probability below capacity on symmetric channels. The proof relies on the notion of `camellia codes': a class of symmetric codes decomposable into `camellias', i.e., set systems that differ from sunflowers by allowing for scattered petal overlaps. The proof then follows from a boosting argument on the camellia petals with second moment Fourier analysis. For erasure channels, this gives a self-contained proof of the bit-error result in Kudekar et al.'17, without relying on sharp thresholds for monotone properties Friedgut-Kalai'96. For error channels, this gives a shortened proof of Reeves-Pfister'23 with an exponentially tighter bound, and a proof variant of the bit-error result in Abbe-Sandon'23. The control of the full (block) error probability still requires Abbe-Sandon'23 for RM codes. △ Less

Submitted 7 December, 2023; originally announced December 2023.

arXiv:2304.02509 [pdf, other]

A proof that Reed-Muller codes achieve Shannon capacity on symmetric channels

Authors: Emmanuel Abbe, Colin Sandon

Abstract: Reed-Muller codes were introduced in 1954, with a simple explicit construction based on polynomial evaluations, and have long been conjectured to achieve Shannon capacity on symmetric channels. Major progress was made towards a proof over the last decades; using combinatorial weight enumerator bounds, a breakthrough on the erasure channel from sharp thresholds, hypercontractivity arguments, and po… ▽ More Reed-Muller codes were introduced in 1954, with a simple explicit construction based on polynomial evaluations, and have long been conjectured to achieve Shannon capacity on symmetric channels. Major progress was made towards a proof over the last decades; using combinatorial weight enumerator bounds, a breakthrough on the erasure channel from sharp thresholds, hypercontractivity arguments, and polarization theory. Another major progress recently established that the bit error probability vanishes slowly below capacity. However, when channels allow for errors, the results of Bourgain-Kalai do not apply for converting a vanishing bit to a vanishing block error probability, neither do the known weight enumerator bounds. The conjecture that RM codes achieve Shannon capacity on symmetric channels, with high probability of recovering the codewords, has thus remained open. This paper closes the conjecture's proof. It uses a new recursive boosting framework, which aggregates the decoding of codeword restrictions on `subspace-sunflowers', handling their dependencies via an $L_p$ Boolean Fourier analysis, and using a list-decoding argument with a weight enumerator bound from Sberlo-Shpilka. The proof does not require a vanishing bit error probability for the base case, but only a non-trivial probability, obtained here for general symmetric codes. This gives in particular a shortened and tightened argument for the vanishing bit error probability result of Reeves-Pfister, and with prior works, it implies the strong wire-tap secrecy of RM codes on pure-state classical-quantum channels. △ Less

Submitted 5 April, 2023; originally announced April 2023.

arXiv:2210.05893 [pdf, other]

The Power of Two Matrices in Spectral Algorithms

Authors: Souvik Dhara, Julia Gaudio, Elchanan Mossel, Colin Sandon

Abstract: Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms have been successfully used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study the power of spectral alg… ▽ More Spectral algorithms are some of the main tools in optimization and inference problems on graphs. Typically, the graph is encoded as a matrix and eigenvectors and eigenvalues of the matrix are then used to solve the given graph problem. Spectral algorithms have been successfully used for graph partitioning, hidden clique recovery and graph coloring. In this paper, we study the power of spectral algorithms using two matrices in a graph partitioning problem. We use two different matrices resulting from two different encodings of the same graph and then combine the spectral information coming from these two matrices. We analyze a two-matrix spectral algorithm for the problem of identifying latent community structure in large random graphs. In particular, we consider the problem of recovering community assignments exactly in the censored stochastic block model, where each edge status is revealed independently with some probability. We show that spectral algorithms based on two matrices are optimal and succeed in recovering communities up to the information theoretic threshold. On the other hand, we show that for most choices of the parameters, any spectral algorithm based on one matrix is suboptimal. This is in contrast to our prior works (2022a, 2022b) which showed that for the symmetric Stochastic Block Model and the Planted Dense Subgraph problem, a spectral algorithm based on one matrix achieves the information theoretic threshold. We additionally provide more general geometric conditions for the (sub)-optimality of spectral algorithms. △ Less

Submitted 7 March, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

Comments: 34 pages, 1 figure Added results on more than two communities; corrected proof of statistical achievability

arXiv:2203.11847 [pdf, other]

Spectral Algorithms Optimally Recover Planted Sub-structures

Authors: Souvik Dhara, Julia Gaudio, Elchanan Mossel, Colin Sandon

Abstract: Spectral algorithms are an important building block in machine learning and graph algorithms. We are interested in studying when such algorithms can be applied directly to provide optimal solutions to inference tasks. Previous works by Abbe, Fan, Wang and Zhong (2020) and by Dhara, Gaudio, Mossel and Sandon (2022) showed the optimality for community detection in the Stochastic Block Model (SBM), a… ▽ More Spectral algorithms are an important building block in machine learning and graph algorithms. We are interested in studying when such algorithms can be applied directly to provide optimal solutions to inference tasks. Previous works by Abbe, Fan, Wang and Zhong (2020) and by Dhara, Gaudio, Mossel and Sandon (2022) showed the optimality for community detection in the Stochastic Block Model (SBM), as well as in a censored variant of the SBM. Here we show that this optimality is somewhat universal as it carries over to other planted substructures such as the planted dense subgraph problem and submatrix localization problem, as well as to a censored version of the planted dense subgraph problem. △ Less

Submitted 11 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: 28 pages, 2 figures; New content on submatrix localization

arXiv:2108.04190 [pdf, ps, other]

On the Power of Differentiable Learning versus PAC and SQ Learning

Authors: Emmanuel Abbe, Pritish Kamath, Eran Malach, Colin Sandon, Nathan Srebro

Abstract: We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends… ▽ More We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision $ρ$ of the gradient calculations relative to the minibatch size $b$ (for SGD) and sample size $m$ (for GD). With fine enough precision relative to minibatch size, namely when $b ρ$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$. Similarly, with fine enough precision relative to the sample size $m$, GD can also simulate any sample-based learning algorithm based on $m$ samples. In particular, with polynomially many bits of precision (i.e. when $ρ$ is exponentially small), SGD and GD can both simulate PAC learning regardless of the mini-batch size. On the other hand, when $b ρ^2$ is large enough, the power of SGD is equivalent to that of SQ learning. △ Less

Submitted 5 February, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

arXiv:2106.08393 [pdf, ps, other]

Spoofing Generalization: When Can't You Trust Proprietary Models?

Authors: Ankur Moitra, Elchanan Mossel, Colin Sandon

Abstract: In this work, we study the computational complexity of determining whether a machine learning model that perfectly fits the training data will generalizes to unseen data. In particular, we study the power of a malicious agent whose goal is to construct a model g that fits its training data and nothing else, but is indistinguishable from an accurate model f. We say that g strongly spoofs f if no po… ▽ More In this work, we study the computational complexity of determining whether a machine learning model that perfectly fits the training data will generalizes to unseen data. In particular, we study the power of a malicious agent whose goal is to construct a model g that fits its training data and nothing else, but is indistinguishable from an accurate model f. We say that g strongly spoofs f if no polynomial-time algorithm can tell them apart. If instead we restrict to algorithms that run in $n^c$ time for some fixed $c$, we say that g c-weakly spoofs f. Our main results are 1. Under cryptographic assumptions, strong spoofing is possible and 2. For any c> 0, c-weak spoofing is possible unconditionally While the assumption of a malicious agent is an extreme scenario (hopefully companies training large models are not malicious), we believe that it sheds light on the inherent difficulties of blindly trusting large proprietary models or data. △ Less

Submitted 23 March, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

arXiv:2101.06178 [pdf, ps, other]

Learning to Sample from Censored Markov Random Fields

Authors: Ankur Moitra, Elchanan Mossel, Colin Sandon

Abstract: We study learning Censor Markov Random Fields (abbreviated CMRFs). These are Markov Random Fields where some of the nodes are censored (not observed). We present an algorithm for learning high-temperature CMRFs within o(n) transportation distance. Crucially our algorithm makes no assumption about the structure of the graph or the number or location of the observed nodes. We obtain stronger results… ▽ More We study learning Censor Markov Random Fields (abbreviated CMRFs). These are Markov Random Fields where some of the nodes are censored (not observed). We present an algorithm for learning high-temperature CMRFs within o(n) transportation distance. Crucially our algorithm makes no assumption about the structure of the graph or the number or location of the observed nodes. We obtain stronger results for high girth high-temperature CMRFs as well as computational lower bounds indicating that our results can not be qualitatively improved. △ Less

Submitted 15 January, 2021; originally announced January 2021.

arXiv:2001.02992 [pdf, other]

Poly-time universality and limitations of deep learning

Authors: Emmanuel Abbe, Colin Sandon

Abstract: The goal of this paper is to characterize function distributions that deep learning can or cannot learn in poly-time. A universality result is proved for SGD-based deep learning and a non-universality result is proved for GD-based deep learning; this also gives a separation between SGD-based deep learning and statistical query algorithms: (1) {\it Deep learning with SGD is efficiently universal.… ▽ More The goal of this paper is to characterize function distributions that deep learning can or cannot learn in poly-time. A universality result is proved for SGD-based deep learning and a non-universality result is proved for GD-based deep learning; this also gives a separation between SGD-based deep learning and statistical query algorithms: (1) {\it Deep learning with SGD is efficiently universal.} Any function distribution that can be learned from samples in poly-time can also be learned by a poly-size neural net trained with SGD on a poly-time initialization with poly-steps, poly-rate and possibly poly-noise. Therefore deep learning provides a universal learning paradigm: it was known that the approximation and estimation errors could be controlled with poly-size neural nets, using ERM that is NP-hard; this new result shows that the optimization error can also be controlled with SGD in poly-time. The picture changes for GD with large enough batches: (2) {\it Result (1) does not hold for GD:} Neural nets of poly-size trained with GD (full gradients or large enough batches) on any initialization with poly-steps, poly-range and at least poly-noise cannot learn any function distribution that has super-polynomial {\it cross-predictability,} where the cross-predictability gives a measure of ``average'' function correlation -- relations and distinctions to the statistical dimension are discussed. In particular, GD with these constraints can learn efficiently monomials of degree $k$ if and only if $k$ is constant. Thus (1) and (2) point to an interesting contrast: SGD is universal even with some poly-noise while full GD or SQ algorithms are not (e.g., parities). △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1812.06369

arXiv:1904.05483 [pdf, ps, other]

Parallels Between Phase Transitions and Circuit Complexity?

Authors: Ankur Moitra, Elchanan Mossel, Colin Sandon

Abstract: In many natural average-case problems, there are or there are believed to be critical values in the parameter space where the structure of the space of solutions changes in a fundamental way. These phase transitions are often believed to coincide with drastic changes in the computational complexity of the associated problem. In this work, we study the circuit complexity of inference in the broad… ▽ More In many natural average-case problems, there are or there are believed to be critical values in the parameter space where the structure of the space of solutions changes in a fundamental way. These phase transitions are often believed to coincide with drastic changes in the computational complexity of the associated problem. In this work, we study the circuit complexity of inference in the broadcast tree model, which has important applications in phylogenetic reconstruction and close connections to community detection. We establish a number of qualitative connections between phase transitions and circuit complexity in this model. Specifically, we show that there is a $\mathbf{TC}^0$ circuit that competes with the Bayes optimal predictor in some range of parameters above the Kesten-Stigum bound. We also show that there is a $16$ label broadcast tree model beneath the Kesten-Stigum bound in which it is possible to accurately guess the label of the root, but beating random guessing is $\mathbf{NC}^1$-hard on average. The key to locating phase transitions is often to study some intrinsic notions of complexity associated with belief propagation \--- e.g. where do linear statistics fail, or when is the posterior sensitive to noise? Ours is the first work to study the complexity of belief propagation in a way that is grounded in circuit complexity. △ Less

Submitted 9 December, 2019; v1 submitted 10 April, 2019; originally announced April 2019.

Comments: The paper was titled: The Circuit Complexity of Inference in the first version

arXiv:1812.06369 [pdf, other]

Provable limitations of deep learning

Authors: Emmanuel Abbe, Colin Sandon

Abstract: As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This paper gives a first set of results proving that certain deep learning algorithms fail at learning certain efficiently learnable functions. The results put forward a notion of cross-predictability that characterizes when such failures take place. Parity functions provide… ▽ More As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This paper gives a first set of results proving that certain deep learning algorithms fail at learning certain efficiently learnable functions. The results put forward a notion of cross-predictability that characterizes when such failures take place. Parity functions provide an extreme example with a cross-predictability that decays exponentially, while a mere super-polynomial decay of the cross-predictability is shown to be sufficient to obtain failures. Examples in community detection and arithmetic learning are also discussed. Recall that it is known that the class of neural networks (NNs) with polynomial network size can express any function that can be implemented in polynomial time, and that their sample complexity scales polynomially with the network size. The challenge is with the optimization error (the ERM is NP-hard), and the success behind deep learning is to train deep NNs with descent algorithms. The failures shown in this paper apply to training poly-size NNs on function distributions of low cross-predictability with a descent algorithm that is either run with limited memory per sample or that is initialized and run with enough randomness. We further claim that such types of constraints are necessary to obtain failures, in that exact SGD with careful non-random initialization can be shown to learn parities. The cross-predictability in our results plays a similar role the statistical dimension in statistical query (SQ) algorithms, with distinctions explained in the paper. The proof techniques are based on exhibiting algorithmic constraints that imply a statistical indistinguishability between the algorithm's output on the test model v.s.\ a null model, using information measures to bound the total variation distance. △ Less

Submitted 29 April, 2019; v1 submitted 15 December, 2018; originally announced December 2018.

arXiv:1809.04818 [pdf, other]

doi 10.1137/19M1257135

Graph powering and spectral robustness

Authors: Emmanuel Abbe, Enric Boix, Peter Ralli, Colin Sandon

Abstract: Spectral algorithms, such as principal component analysis and spectral clustering, typically require careful data transformations to be effective: upon observing a matrix $A$, one may look at the spectrum of $ψ(A)$ for a properly chosen $ψ$. The issue is that the spectrum of $A$ might be contaminated by non-informational top eigenvalues, e.g., due to scale` variations in the data, and the applicat… ▽ More Spectral algorithms, such as principal component analysis and spectral clustering, typically require careful data transformations to be effective: upon observing a matrix $A$, one may look at the spectrum of $ψ(A)$ for a properly chosen $ψ$. The issue is that the spectrum of $A$ might be contaminated by non-informational top eigenvalues, e.g., due to scale` variations in the data, and the application of $ψ$ aims to remove these. Designing a good functional $ψ$ (and establishing what good means) is often challenging and model dependent. This paper proposes a simple and generic construction for sparse graphs, $$ψ(A) = \1((I+A)^r \ge1),$$ where $A$ denotes the adjacency matrix and $r$ is an integer (less than the graph diameter). This produces a graph connecting vertices from the original graph that are within distance $r$, and is referred to as graph powering. It is shown that graph powering regularizes the graph and decontaminates its spectrum in the following sense: (i) If the graph is drawn from the sparse Erdős-Rényi ensemble, which has no spectral gap, it is shown that graph powering produces a `maximal' spectral gap, with the latter justified by establishing an Alon-Boppana result for powered graphs; (ii) If the graph is drawn from the sparse SBM, graph powering is shown to achieve the fundamental limit for weak recovery (the KS threshold) similarly to \cite{massoulie-STOC}, settling an open problem therein. Further, graph powering is shown to be significantly more robust to tangles and cliques than previous spectral algorithms based on self-avoiding or nonbacktracking walk counts \cite{massoulie-STOC,Mossel_SBM2,bordenave,colin3}. This is illustrated on a geometric block model that is dense in cliques. △ Less

Submitted 13 September, 2018; originally announced September 2018.

arXiv:1512.09080 [pdf, other]

Detection in the stochastic block model with multiple clusters: proof of the achievability conjectures, acyclic BP, and the information-computation gap

Authors: Emmanuel Abbe, Colin Sandon

Abstract: In a paper that initiated the modern study of the stochastic block model, Decelle et al., backed by Mossel et al., made the following conjecture: Denote by $k$ the number of balanced communities, $a/n$ the probability of connecting inside communities and $b/n$ across, and set $\mathrm{SNR}=(a-b)^2/(k(a+(k-1)b)$; for any $k \geq 2$, it is possible to detect communities efficiently whenever… ▽ More In a paper that initiated the modern study of the stochastic block model, Decelle et al., backed by Mossel et al., made the following conjecture: Denote by $k$ the number of balanced communities, $a/n$ the probability of connecting inside communities and $b/n$ across, and set $\mathrm{SNR}=(a-b)^2/(k(a+(k-1)b)$; for any $k \geq 2$, it is possible to detect communities efficiently whenever $\mathrm{SNR}>1$ (the KS threshold), whereas for $k\geq 4$, it is possible to detect communities information-theoretically for some $\mathrm{SNR}<1$. Massoulié, Mossel et al.\ and Bordenave et al.\ succeeded in proving that the KS threshold is efficiently achievable for $k=2$, while Mossel et al.\ proved that it cannot be crossed information-theoretically for $k=2$. The above conjecture remained open for $k \geq 3$. This paper proves this conjecture, further extending the efficient detection to non-symmetrical SBMs with a generalized notion of detection and KS threshold. For the efficient part, a linearized acyclic belief propagation (ABP) algorithm is developed and proved to detect communities for any $k$ down to the KS threshold in time $O(n \log n)$. Achieving this requires showing optimality of ABP in the presence of cycles, a challenge for message passing algorithms. The paper further connects ABP to a power iteration method with a nonbacktracking operator of generalized order, formalizing the interplay between message passing and spectral methods. For the information-theoretic (IT) part, a non-efficient algorithm sampling a typical clustering is shown to break down the KS threshold at $k=4$. The emerging gap is shown to be large in some cases; if $a=0$, the KS threshold reads $b \gtrsim k^2$ whereas the IT bound reads $b \gtrsim k \ln(k)$, making the SBM a good study-case for information-computation gaps. △ Less

Submitted 14 September, 2016; v1 submitted 30 December, 2015; originally announced December 2015.

Comments: Extended version with further details on the algorithms and methods

arXiv:1506.03729 [pdf, other]

Recovering communities in the general stochastic block model without knowing the parameters

Authors: Emmanuel Abbe, Colin Sandon

Abstract: Most recent developments on the stochastic block model (SBM) rely on the knowledge of the model parameters, or at least on the number of communities. This paper introduces efficient algorithms that do not require such knowledge and yet achieve the optimal information-theoretic tradeoffs identified in [AS15] for linear size communities. The results are three-fold: (i) in the constant degree regime,… ▽ More Most recent developments on the stochastic block model (SBM) rely on the knowledge of the model parameters, or at least on the number of communities. This paper introduces efficient algorithms that do not require such knowledge and yet achieve the optimal information-theoretic tradeoffs identified in [AS15] for linear size communities. The results are three-fold: (i) in the constant degree regime, an algorithm is developed that requires only a lower-bound on the relative sizes of the communities and detects communities with an optimal accuracy scaling for large degrees; (ii) in the regime where degrees are scaled by $ω(1)$ (diverging degrees), this is enhanced into a fully agnostic algorithm that only takes the graph in question and simultaneously learns the model parameters (including the number of communities) and detects communities with accuracy $1-o(1)$, with an overall quasi-linear complexity; (iii) in the logarithmic degree regime, an agnostic algorithm is developed that learns the parameters and achieves the optimal CH-limit for exact recovery, in quasi-linear time. These provide the first algorithms affording efficiency, universality and information-theoretic optimality for strong and weak consistency in the general SBM with linear size communities. △ Less

Submitted 11 June, 2015; originally announced June 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1503.00609

arXiv:1503.00609 [pdf, other]

Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms

Authors: Emmanuel Abbe, Colin Sandon

Abstract: New phase transition phenomena have recently been discovered for the stochastic block model, for the special case of two non-overlapping symmetric communities. This gives raise in particular to new algorithmic challenges driven by the thresholds. This paper investigates whether a general phenomenon takes place for multiple communities, without imposing symmetry. In the general stochastic block m… ▽ More New phase transition phenomena have recently been discovered for the stochastic block model, for the special case of two non-overlapping symmetric communities. This gives raise in particular to new algorithmic challenges driven by the thresholds. This paper investigates whether a general phenomenon takes place for multiple communities, without imposing symmetry. In the general stochastic block model $\text{SBM}(n,p,Q)$, $n$ vertices are split into $k$ communities of relative size $\{p_i\}_{i \in [k]}$, and vertices in community $i$ and $j$ connect independently with probability $\{Q_{i,j}\}_{i,j \in [k]}$. This paper investigates the partial and exact recovery of communities in the general SBM (in the constant and logarithmic degree regimes), and uses the generality of the results to tackle overlapping communities. The contributions of the paper are: (i) an explicit characterization of the recovery threshold in the general SBM in terms of a new divergence function $D_+$, which generalizes the Hellinger and Chernoff divergences, and which provides an operational meaning to a divergence function analog to the KL-divergence in the channel coding theorem, (ii) the development of an algorithm that recovers the communities all the way down to the optimal threshold and runs in quasi-linear time, showing that exact recovery has no information-theoretic to computational gap for multiple communities, in contrast to the conjectures made for detection with more than 4 communities; note that the algorithm is optimal both in terms of achieving the threshold and in having quasi-linear complexity, (iii) the development of an efficient algorithm that detects communities in the constant degree regime with an explicit accuracy bound that can be made arbitrarily close to 1 when a prescribed signal-to-noise ratio (defined in term of the spectrum of $\diag(p)Q$) tends to infinity. △ Less

Submitted 4 April, 2015; v1 submitted 2 March, 2015; originally announced March 2015.

arXiv:1401.6528 [pdf, other]

Linear Boolean classification, coding and "the critical problem"

Authors: Emmanuel Abbe, Noga Alon, Afonso S. Bandeira, Colin Sandon

Abstract: The problem of constructing a minimal rank matrix over GF(2) whose kernel does not intersect a given set S is considered. In the case where S is a Hamming ball centered at 0, this is equivalent to finding linear codes of largest dimension. For a general set, this is an instance of "the critical problem" posed by Crapo and Rota in 1970. This work focuses on the case where S is an annulus. As oppose… ▽ More The problem of constructing a minimal rank matrix over GF(2) whose kernel does not intersect a given set S is considered. In the case where S is a Hamming ball centered at 0, this is equivalent to finding linear codes of largest dimension. For a general set, this is an instance of "the critical problem" posed by Crapo and Rota in 1970. This work focuses on the case where S is an annulus. As opposed to balls, it is shown that an optimal kernel is composed not only of dense but also of sparse vectors, and the optimal mixture is identified in various cases. These findings corroborate a proposed conjecture that for annulus of inner and outer radius nq and np respectively, the optimal relative rank is given by (1-q)H(p/(1-q)), an extension of the Gilbert-Varshamov bound H(p) conjectured for Hamming balls of radius np. △ Less

Submitted 27 June, 2015; v1 submitted 25 January, 2014; originally announced January 2014.

Showing 1–16 of 16 results for author: Sandon, C