Search | arXiv e-print repository

Typical and atypical solutions in non-convex neural networks with discrete and continuous weights

Authors: Carlo Baldassi, Enrico M. Malatesta, Gabriele Perugini, Riccardo Zecchina

Abstract: We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant sol… ▽ More We study the binary and continuous negative-margin perceptrons as simple non-convex neural network models learning random rules and associations. We analyze the geometry of the landscape of solutions in both models and find important similarities and differences. Both models exhibit subdominant minimizers which are extremely flat and wide. These minimizers coexist with a background of dominant solutions which are composed by an exponential number of algorithmically inaccessible small clusters for the binary case (the frozen 1-RSB phase) or a hierarchical structure of clusters of different sizes for the spherical case (the full RSB phase). In both cases, when a certain threshold in constraint density is crossed, the local entropy of the wide flat minima becomes non-monotonic, indicating a break-up of the space of robust solutions into disconnected components. This has a strong impact on the behavior of algorithms in binary models, which cannot access the remaining isolated clusters. For the spherical case the behaviour is different, since even beyond the disappearance of the wide flat minima the remaining solutions are shown to always be surrounded by a large number of other solutions at any distance, up to capacity. Indeed, we exhibit numerical evidence that algorithms seem to find solutions up to the SAT/UNSAT transition, that we compute here using an 1RSB approximation. For both models, the generalization performance as a learning device is shown to be greatly improved by the existence of wide flat minimizers even when trained in the highly underconstrained regime of very negative margins. △ Less

Submitted 24 July, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

Comments: 34 pages, 13 figures

arXiv:2202.03038 [pdf, other]

doi 10.1088/1742-5468/ac9832

Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry

Authors: Fabrizio Pittorino, Antonio Ferraro, Gabriele Perugini, Christoph Feinauer, Carlo Baldassi, Riccardo Zecchina

Abstract: We systematize the approach to the investigation of deep neural network landscapes by basing it on the geometry of the space of implemented functions rather than the space of parameters. Grouping classifiers into equivalence classes, we develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. On this space, we explore the error landscape rather… ▽ More We systematize the approach to the investigation of deep neural network landscapes by basing it on the geometry of the space of implemented functions rather than the space of parameters. Grouping classifiers into equivalence classes, we develop a standardized parameterization in which all symmetries are removed, resulting in a toroidal topology. On this space, we explore the error landscape rather than the loss. This lets us derive a meaningful notion of the flatness of minimizers and of the geodesic paths connecting them. Using different optimization algorithms that sample minimizers with different flatness we study the mode connectivity and relative distances. Testing a variety of state-of-the-art architectures and benchmark datasets, we confirm the correlation between flatness and generalization performance; we further show that in function space flatter minima are closer to each other and that the barriers along the geodesics connecting them are small. We also find that minimizers found by variants of gradient descent can be connected by zero-error paths composed of two straight lines in parameter space, i.e. polygonal chains with a single bend. We observe similar qualitative results in neural networks with binary weights and activations, providing one of the first results concerning the connectivity in this setting. Our results hinge on symmetry removal, and are in remarkable agreement with the rich phenomenology described by some recent analytical studies performed on simple shallow models. △ Less

Submitted 16 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

arXiv:2112.10219 [pdf, other]

Quantum Approximate Optimization Algorithm applied to the binary perceptron

Authors: Pietro Torta, Glen B. Mbeng, Carlo Baldassi, Riccardo Zecchina, Giuseppe E. Santoro

Abstract: We apply digitized Quantum Annealing (QA) and Quantum Approximate Optimization Algorithm (QAOA) to a paradigmatic task of supervised learning in artificial neural networks: the optimization of synaptic weights for the binary perceptron. At variance with the usual QAOA applications to MaxCut, or to quantum spin-chains ground state preparation, the classical Hamiltonian is characterized by highly no… ▽ More We apply digitized Quantum Annealing (QA) and Quantum Approximate Optimization Algorithm (QAOA) to a paradigmatic task of supervised learning in artificial neural networks: the optimization of synaptic weights for the binary perceptron. At variance with the usual QAOA applications to MaxCut, or to quantum spin-chains ground state preparation, the classical Hamiltonian is characterized by highly non-local multi-spin interactions. Yet, we provide evidence for the existence of optimal smooth solutions for the QAOA parameters, which are transferable among typical instances of the same problem, and we prove numerically an enhanced performance of QAOA over traditional QA. We also investigate on the role of the QAOA optimization landscape geometry in this problem, showing that the detrimental effect of a gap-closing transition encountered in QA is also negatively affecting the performance of our implementation of QAOA. △ Less

Submitted 19 December, 2021; originally announced December 2021.

Comments: 14 pages, 9 figures

arXiv:2110.14583 [pdf, other]

doi 10.1088/2632-2153/ac7d3b

Deep learning via message passing algorithms based on belief propagation

Authors: Carlo Lucibello, Fabrizio Pittorino, Gabriele Perugini, Riccardo Zecchina

Abstract: Message-passing algorithms based on the Belief Propagation (BP) equations constitute a well-known distributed computational scheme. It is exact on tree-like graphical models and has also proven to be effective in many problems defined on graphs with loops (from inference to optimization, from signal processing to clustering). The BP-based scheme is fundamentally different from stochastic gradient… ▽ More Message-passing algorithms based on the Belief Propagation (BP) equations constitute a well-known distributed computational scheme. It is exact on tree-like graphical models and has also proven to be effective in many problems defined on graphs with loops (from inference to optimization, from signal processing to clustering). The BP-based scheme is fundamentally different from stochastic gradient descent (SGD), on which the current success of deep networks is based. In this paper, we present and adapt to mini-batch training on GPUs a family of BP-based message-passing algorithms with a reinforcement field that biases distributions towards locally entropic solutions. These algorithms are capable of training multi-layer neural networks with discrete weights and activations with performance comparable to SGD-inspired heuristics (BinaryNet) and are naturally well-adapted to continual learning. Furthermore, using these algorithms to estimate the marginals of the weights allows us to make approximate Bayesian predictions that have higher accuracy than point-wise solutions. △ Less

Submitted 15 March, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Journal ref: Mach. Learn.: Sci. Technol. 3 035005 (2022)

arXiv:2110.00683 [pdf, other]

doi 10.1103/PhysRevE.106.014116

Learning through atypical "phase transitions" in overparameterized neural networks

Authors: Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Rosalba Pacelli, Gabriele Perugini, Riccardo Zecchina

Abstract: Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for non-convex optimi… ▽ More Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered systems to analytically study the computational fallout of overparameterization in non-convex binary neural network models, trained on data generated from a structurally simpler but "hidden" network. As the number of connection weights increases, we follow the changes of the geometrical structure of different minima of the error loss function and relate them to learning and generalization performance. A first transition happens at the so-called interpolation point, when solutions begin to exist (perfect fitting becomes possible). This transition reflects the properties of typical solutions, which however are in sharp minima and hard to sample. After a gap, a second transition occurs, with the discontinuous appearance of a different kind of "atypical" structures: wide regions of the weight space that are particularly solution-dense and have good generalization properties. The two kinds of solutions coexist, with the typical ones being exponentially more numerous, but empirically we find that efficient algorithms sample the atypical, rare ones. This suggests that the atypical phase transition is the relevant one for learning. The results of numerical tests with realistic networks on observables suggested by the theory are consistent with this scenario. △ Less

Submitted 11 June, 2022; v1 submitted 1 October, 2021; originally announced October 2021.

Comments: 28 pages, 14 figures

arXiv:2107.01163 [pdf, other]

doi 10.1103/PhysRevLett.127.278301

Unveiling the structure of wide flat minima in neural networks

Authors: Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Gabriele Perugini, Riccardo Zecchina

Abstract: The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks. Moreover, such algorithms are able… ▽ More The success of deep learning has revealed the application potential of neural networks across the sciences and opened up fundamental theoretical problems. In particular, the fact that learning algorithms based on simple variants of gradient methods are able to find near-optimal minima of highly nonconvex loss functions is an unexpected feature of neural networks. Moreover, such algorithms are able to fit the data even in the presence of noise, and yet they have excellent predictive capabilities. Several empirical results have shown a reproducible correlation between the so-called flatness of the minima achieved by the algorithms and the generalization performance. At the same time, statistical physics results have shown that in nonconvex networks a multitude of narrow minima may coexist with a much smaller number of wide flat minima, which generalize well. Here we show that wide flat minima arise as complex extensive structures, from the coalescence of minima around "high-margin" (i.e., locally robust) configurations. Despite being exponentially rare compared to zero-margin ones, high-margin minima tend to concentrate in particular regions. These minima are in turn surrounded by other solutions of smaller and smaller margin, leading to dense regions of solutions over long distances. Our analysis also provides an alternative analytical method for estimating when flat minima appear and when algorithms begin to find solutions, as the number of model parameters varies. △ Less

Submitted 14 February, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: 15 pages, 8 figures

arXiv:2010.14761 [pdf, other]

doi 10.1088/1742-5468/abcd31

Wide flat minima and optimal generalization in classifying high-dimensional Gaussian mixtures

Authors: Carlo Baldassi, Enrico M. Malatesta, Matteo Negri, Riccardo Zecchina

Abstract: We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explore analytically the error-counting loss landscape i… ▽ More We analyze the connection between minimizers with good generalizing properties and high local entropy regions of a threshold-linear classifier in Gaussian mixtures with the mean squared error loss function. We show that there exist configurations that achieve the Bayes-optimal generalization error, even in the case of unbalanced clusters. We explore analytically the error-counting loss landscape in the vicinity of a Bayes-optimal solution, and show that the closer we get to such configurations, the higher the local entropy, implying that the Bayes-optimal solution lays inside a wide flat region. We also consider the algorithmically relevant case of targeting wide flat minima of the (differentiable) mean squared error loss. Our analytical and numerical results show not only that in the balanced case the dependence on the norm of the weights is mild, but also, in the unbalanced case, that the performances can be improved. △ Less

Submitted 17 November, 2020; v1 submitted 26 October, 2020; originally announced October 2020.

Comments: 19 pages, 4 figures. arXiv admin note: text overlap with arXiv:2006.07897

arXiv:2006.07897 [pdf, other]

doi 10.1088/1742-5468/ac3ae8

Entropic gradient descent algorithms and wide flat minima

Authors: Fabrizio Pittorino, Carlo Lucibello, Christoph Feinauer, Gabriele Perugini, Carlo Baldassi, Elizaveta Demyanenko, Riccardo Zecchina

Abstract: The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixture classification models and show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide f… ▽ More The properties of flat minima in the empirical risk landscape of neural networks have been debated for some time. Increasing evidence suggests they possess better generalization capabilities with respect to sharp ones. First, we discuss Gaussian mixture classification models and show analytically that there exist Bayes optimal pointwise estimators which correspond to minimizers belonging to wide flat regions. These estimators can be found by applying maximum flatness algorithms either directly on the classifier (which is norm independent) or on the differentiable loss function used in learning. Next, we extend the analysis to the deep learning scenario by extensive numerical validations. Using two algorithms, Entropy-SGD and Replicated-SGD, that explicitly include in the optimization objective a non-local flatness measure known as local entropy, we consistently improve the generalization error for common architectures (e.g. ResNet, EfficientNet). An easy to compute flatness measure shows a clear correlation with test accuracy. △ Less

Submitted 15 November, 2021; v1 submitted 14 June, 2020; originally announced June 2020.

Comments: ICLR 2021 camera-ready

arXiv:1909.13327 [pdf, other]

Natural representation of composite data with replicated autoencoders

Authors: Matteo Negri, Davide Bergamini, Carlo Baldassi, Riccardo Zecchina, Christoph Feinauer

Abstract: Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the `local entropy' rather than the standard loss, resulting in a mor… ▽ More Generative processes in biology and other fields often produce data that can be regarded as resulting from a composition of basic features. Here we present an unsupervised method based on autoencoders for inferring these basic features of data. The main novelty in our approach is that the training is based on the optimization of the `local entropy' rather than the standard loss, resulting in a more robust inference, and enhancing the performance on this type of data considerably. Algorithmically, this is realized by training an interacting system of replicated autoencoders. We apply this method to synthetic and protein sequence data, and show that it is able to infer a hidden representation that correlates well with the underlying generative process, without requiring any prior knowledge. △ Less

Submitted 29 September, 2019; originally announced September 2019.

Comments: 11 pages, 4 figures

arXiv:1907.07578 [pdf, other]

doi 10.1103/PhysRevLett.123.170602

Properties of the geometry of solutions and capacity of multi-layer neural networks with Rectified Linear Units activations

Authors: Carlo Baldassi, Enrico M. Malatesta, Riccardo Zecchina

Abstract: Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity… ▽ More Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural networks with either binary or real-valued weights. We study the problem of storing an extensive number of random patterns and find that, quite unexpectedly, the capacity of the network remains finite as the number of neurons in the hidden layer increases, at odds with the case of threshold units in which the capacity diverges. Possibly more important, a large deviation approach allows us to find that the geometrical landscape of the solution space has a peculiar structure: while the majority of solutions are close in distance but still isolated, there exist rare regions of solutions which are much more dense than the similar ones in the case of threshold units. These solutions are robust to perturbations of the weights and can tolerate large perturbations of the inputs. The analytical results are corroborated by numerical findings. △ Less

Submitted 3 May, 2024; v1 submitted 17 July, 2019; originally announced July 2019.

Comments: 11 pages, 3 figures

Journal ref: Phys. Rev. Lett. 123, 170602 (2019)

arXiv:1905.07833 [pdf, other]

doi 10.1073/pnas.1908636117

Shaping the learning landscape in neural networks around wide flat minima

Authors: Carlo Baldassi, Fabrizio Pittorino, Riccardo Zecchina

Abstract: Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overfitting. How these two features can be kept under c… ▽ More Learning in Deep Neural Networks (DNN) takes place by minimizing a non-convex high-dimensional loss function, typically by a stochastic gradient descent (SGD) strategy. The learning process is observed to be able to find good minimizers without getting stuck in local critical points, and that such minimizers are often satisfactory at avoiding overfitting. How these two features can be kept under control in nonlinear devices composed of millions of tunable connections is a profound and far reaching open question. In this paper we study basic non-convex one- and two-layer neural network models which learn random patterns, and derive a number of basic geometrical and algorithmic features which suggest some answers. We first show that the error loss function presents few extremely wide flat minima (WFM) which coexist with narrower minima and critical points. We then show that the minimizers of the cross-entropy loss function overlap with the WFM of the error loss. We also show examples of learning devices for which WFM do not exist. From the algorithmic perspective we derive entropy driven greedy and message passing algorithms which focus their search on wide flat regions of minimizers. In the case of SGD and cross-entropy loss, we show that a slow reduction of the norm of the weights along the learning process also leads to WFM. We corroborate the results by a numerical study of the correlations between the volumes of the minimizers, their Hessian and their generalization performance on real data. △ Less

Submitted 11 March, 2020; v1 submitted 19 May, 2019; originally announced May 2019.

Comments: 37 pages (16 main text), 10 figures (7 main text)

Journal ref: Proceedings of the National Academy of Sciences, 2020 Jan 7, 117 (1) 161-170

arXiv:1710.09825 [pdf, other]

doi 10.1103/PhysRevLett.120.268103

On the role of synaptic stochasticity in training low-precision neural networks

Authors: Carlo Baldassi, Federica Gerace, Hilbert J. Kappen, Carlo Lucibello, Luca Saglietti, Enzo Tartaglione, Riccardo Zecchina

Abstract: Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performanc… ▽ More Stochasticity and limited precision of synaptic weights in neural network models are key aspects of both biological and hardware modeling of learning processes. Here we show that a neural network model with stochastic binary weights naturally gives prominence to exponentially rare dense regions of solutions with a number of desirable properties such as robustness and good generalization performance, while typical solutions are isolated and hard to find. Binary solutions of the standard perceptron problem are obtained from a simple gradient descent procedure on a set of real values parametrizing a probability distribution over the binary synapses. Both analytical and numerical results are presented. An algorithmic extension aimed at training discrete deep neural networks is also investigated. △ Less

Submitted 19 March, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

Comments: 7 pages + 14 pages of supplementary material

Journal ref: Phys. Rev. Lett. 120, 268103 (2018)

arXiv:1707.00424 [pdf, other]

Parle: parallelizing stochastic gradient descent

Authors: Pratik Chaudhari, Carlo Baldassi, Riccardo Zecchina, Stefano Soatto, Ameet Talwalkar, Adam Oberman

Abstract: We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4x faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We exploit the phenomenon of flat minima that has been sh… ▽ More We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4x faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-100, without introducing any additional hyper-parameters. We exploit the phenomenon of flat minima that has been shown to lead to improved generalization error for deep networks. Parle requires very infrequent communication with the parameter server and instead performs more computation on each client, which makes it well-suited to both single-machine, multi-GPU settings and distributed implementations. △ Less

Submitted 10 September, 2017; v1 submitted 3 July, 2017; originally announced July 2017.

arXiv:1706.08470 [pdf, other]

doi 10.1073/pnas.1711456115

Efficiency of quantum versus classical annealing in non-convex learning problems

Authors: Carlo Baldassi, Riccardo Zecchina

Abstract: Quantum annealers aim at solving non-convex optimization problems by exploiting cooperative tunneling effects to escape local minima. The underlying idea consists in designing a classical energy function whose ground states are the sought optimal solutions of the original optimization problem and add a controllable quantum transverse field to generate tunneling processes. A key challenge is to ide… ▽ More Quantum annealers aim at solving non-convex optimization problems by exploiting cooperative tunneling effects to escape local minima. The underlying idea consists in designing a classical energy function whose ground states are the sought optimal solutions of the original optimization problem and add a controllable quantum transverse field to generate tunneling processes. A key challenge is to identify classes of non-convex optimization problems for which quantum annealing remains efficient while thermal annealing fails. We show that this happens for a wide class of problems which are central to machine learning. Their energy landscapes is dominated by local minima that cause exponential slow down of classical thermal annealers while simulated quantum annealing converges efficiently to rare dense regions of optimal solutions. △ Less

Submitted 16 October, 2017; v1 submitted 26 June, 2017; originally announced June 2017.

Comments: 31 pages, 10 figures

Journal ref: Proceedings of the National Academy of Sciences Jan 2018, 201711456

arXiv:1611.01838 [pdf, other]

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Authors: Pratik Chaudhari, Anna Choromanska, Stefano Soatto, Yann LeCun, Carlo Baldassi, Christian Borgs, Jennifer Chayes, Levent Sagun, Riccardo Zecchina

Abstract: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based object… ▽ More This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time. △ Less

Submitted 21 April, 2017; v1 submitted 6 November, 2016; originally announced November 2016.

Comments: ICLR '17

arXiv:1605.06444 [pdf, other]

doi 10.1073/pnas.1608103113

Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

Authors: Carlo Baldassi, Christian Borgs, Jennifer Chayes, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti, Riccardo Zecchina

Abstract: In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here w… ▽ More In artificial neural networks, learning from data is a computationally demanding task in which a large number of connection weights are iteratively tuned through stochastic-gradient-based heuristic processes over a cost-function. It is not well understood how learning occurs in these systems, in particular how they avoid getting trapped in configurations with poor computational performance. Here we study the difficult case of networks with discrete weights, where the optimization landscape is very rough even for simple architectures, and provide theoretical and numerical evidence of the existence of rare - but extremely dense and accessible - regions of configurations in the network weight space. We define a novel measure, which we call the "robust ensemble" (RE), which suppresses trapping by isolated configurations and amplifies the role of these dense regions. We analytically compute the RE in some exactly solvable models, and also provide a general algorithmic scheme which is straightforward to implement: define a cost-function given by a sum of a finite number of replicas of the original cost-function, with a constraint centering the replicas around a driving assignment. To illustrate this, we derive several powerful new algorithms, ranging from Markov Chains to message passing to gradient descent processes, where the algorithms target the robust dense states, resulting in substantial improvements in performance. The weak dependence on the number of precision bits of the weights leads us to conjecture that very similar reasoning applies to more conventional neural networks. Analogous algorithmic schemes can also be applied to other optimization problems. △ Less

Submitted 6 October, 2016; v1 submitted 20 May, 2016; originally announced May 2016.

Comments: 31 pages (14 main text, 18 appendix), 12 figures (6 main text, 6 appendix)

Journal ref: Proc. Natl. Acad. Sci. U.S.A. 113(48):E7655-E7662, 2016

arXiv:1309.3984 [pdf, other]

Stochastic Optimization of Service Provision with Selfish Users

Authors: F. Altarelli, A. Braunstein, C. F. Chiasserini, L. Dall'Asta, P. Giaccone, E. Leonardi, R. Zecchina

Abstract: We develop a computationally efficient technique to solve a fairly general distributed service provision problem with selfish users and imperfect information. In particular, in a context in which the service capacity of the existing infrastructure can be partially adapted to the user load by activating just some of the service units, we aim at finding the configuration of active service units that… ▽ More We develop a computationally efficient technique to solve a fairly general distributed service provision problem with selfish users and imperfect information. In particular, in a context in which the service capacity of the existing infrastructure can be partially adapted to the user load by activating just some of the service units, we aim at finding the configuration of active service units that achieves the best trade-off between maintenance (e.g.\ energetic) costs for the provider and user satisfaction. The core of our technique resides in the implementation of a belief-propagation (BP) algorithm to evaluate the cost configurations. Numerical results confirm the effectiveness of our approach. △ Less

Submitted 16 September, 2013; originally announced September 2013.

Comments: paper presented at NETSTAT Workshop, Budapest - June 2013

arXiv:1309.2805 [pdf, other]

doi 10.1103/PhysRevX.4.021024

Containing epidemic outbreaks by message-passing techniques

Authors: F. Altarelli, A. Braunstein, L. Dall'Asta, J. R. Wakeling, R. Zecchina

Abstract: The problem of targeted network immunization can be defined as the one of finding a subset of nodes in a network to immunize or vaccinate in order to minimize a tradeoff between the cost of vaccination and the final (stationary) expected infection under a given epidemic model. Although computing the expected infection is a hard computational problem, simple and efficient mean-field approximations… ▽ More The problem of targeted network immunization can be defined as the one of finding a subset of nodes in a network to immunize or vaccinate in order to minimize a tradeoff between the cost of vaccination and the final (stationary) expected infection under a given epidemic model. Although computing the expected infection is a hard computational problem, simple and efficient mean-field approximations have been put forward in the literature in recent years. The optimization problem can be recast into a constrained one in which the constraints enforce local mean-field equations describing the average stationary state of the epidemic process. For a wide class of epidemic models, including the susceptible-infected-removed and the susceptible-infected-susceptible models, we define a message-passing approach to network immunization that allows us to study the statistical properties of epidemic outbreaks in the presence of immunized nodes as well as to find (nearly) optimal immunization sets for a given choice of parameters and costs. The algorithm scales linearly with the size of the graph and it can be made efficient even on large networks. We compare its performance with topologically based heuristics, greedy methods, and simulated annealing. △ Less

Submitted 11 September, 2013; originally announced September 2013.

Journal ref: Phys. Rev. X 4, 021024, 2014

arXiv:1309.0346 [pdf, other]

doi 10.1103/PhysRevE.86.026706

On the performance of a cavity method based algorithm for the Prize-Collecting Steiner Tree Problem on graphs

Authors: Indaco Biazzo, Alfredo Braunstein, Riccardo Zecchina

Abstract: We study the behavior of an algorithm derived from the cavity method for the Prize-Collecting Steiner Tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide ra… ▽ More We study the behavior of an algorithm derived from the cavity method for the Prize-Collecting Steiner Tree (PCST) problem on graphs. The algorithm is based on the zero temperature limit of the cavity equations and as such is formally simple (a fixed point equation resolved by iteration) and distributed (parallelizable). We provide a detailed comparison with state-of-the-art algorithms on a wide range of existing benchmarks networks and random graphs. Specifically, we consider an enhanced derivative of the Goemans-Williamson heuristics and the DHEA solver, a Branch and Cut Linear/Integer Programming based approach. The comparison shows that the cavity algorithm outperforms the two algorithms in most large instances both in running time and quality of the solution. Finally we prove a few optimality properties of the solutions provided by our algorithm, including optimality under the two post-processing procedures defined in the Goemans-Williamson derivative and global optimality in some limit cases. △ Less

Submitted 2 September, 2013; originally announced September 2013.

Journal ref: Phys. Rev. E 86, 026706 (2012)

arXiv:1307.6786 [pdf, other]

doi 10.1103/PhysRevLett.112.118701

Bayesian inference of epidemics on networks via Belief Propagation

Authors: Fabrizio Altarelli, Alfredo Braunstein, Luca Dall'Asta, Alejandro Lage-Castellanos, Riccardo Zecchina

Abstract: We study several bayesian inference problems for irreversible stochastic epidemic models on networks from a statistical physics viewpoint. We derive equations which allow to accurately compute the posterior distribution of the time evolution of the state of each node given some observations. At difference with most existing methods, we allow very general observation models, including unobserved no… ▽ More We study several bayesian inference problems for irreversible stochastic epidemic models on networks from a statistical physics viewpoint. We derive equations which allow to accurately compute the posterior distribution of the time evolution of the state of each node given some observations. At difference with most existing methods, we allow very general observation models, including unobserved nodes, state observations made at different or unknown times, and observations of infection times, possibly mixed together. Our method, which is based on the Belief Propagation algorithm, is efficient, naturally distributed, and exact on trees. As a particular case, we consider the problem of finding the "zero patient" of a SIR or SI epidemic given a snapshot of the state of the network at a later unknown time. Numerical simulations show that our method outperforms previous ones on both synthetic and real networks, often by a very large margin. △ Less

Submitted 27 March, 2014; v1 submitted 25 July, 2013; originally announced July 2013.

Journal ref: Phys. Rev. Lett. 112, 118701 (2014)

arXiv:1203.1426 [pdf, other]

doi 10.1088/1742-5468/2013/09/P09011

Optimizing spread dynamics on graphs by message passing

Authors: Fabrizio Altarelli, Alfredo Braunstein, Luca Dall'Asta, Riccardo Zecchina

Abstract: Cascade processes are responsible for many important phenomena in natural and social sciences. Simple models of irreversible dynamics on graphs, in which nodes activate depending on the state of their neighbors, have been successfully applied to describe cascades in a large variety of contexts. Over the last decades, many efforts have been devoted to understand the typical behaviour of the cascade… ▽ More Cascade processes are responsible for many important phenomena in natural and social sciences. Simple models of irreversible dynamics on graphs, in which nodes activate depending on the state of their neighbors, have been successfully applied to describe cascades in a large variety of contexts. Over the last decades, many efforts have been devoted to understand the typical behaviour of the cascades arising from initial conditions extracted at random from some given ensemble. However, the problem of optimizing the trajectory of the system, i.e. of identifying appropriate initial conditions to maximize (or minimize) the final number of active nodes, is still considered to be practically intractable, with the only exception of models that satisfy a sort of diminishing returns property called submodularity. Submodular models can be approximately solved by means of greedy strategies, but by definition they lack cooperative characteristics which are fundamental in many real systems. Here we introduce an efficient algorithm based on statistical physics for the optimization of trajectories in cascade processes on graphs. We show that for a wide class of irreversible dynamics, even in the absence of submodularity, the spread optimization problem can be solved efficiently on large networks. Analytic and algorithmic results on random graphs are complemented by the solution of the spread maximization problem on a real-world network (the Epinions consumer reviews network). △ Less

Submitted 24 May, 2013; v1 submitted 7 March, 2012; originally announced March 2012.

Comments: Replacement for "The Spread Optimization Problem"

Journal ref: J. Stat. Mech. 2013, P09011 (2013)

arXiv:1202.2536 [pdf, ps, other]

doi 10.1088/1742-5468/2012/05/P05025

Message passing for quantified Boolean formulas

Authors: Pan Zhang, Abolfazl Ramezanpour, Lenka Zdeborová, Riccardo Zecchina

Abstract: We introduce two types of message passing algorithms for quantified Boolean formulas (QBF). The first type is a message passing based heuristics that can prove unsatisfiability of the QBF by assigning the universal variables in such a way that the remaining formula is unsatisfiable. In the second type, we use message passing to guide branching heuristics of a Davis-Putnam Logemann-Loveland (DPLL)… ▽ More We introduce two types of message passing algorithms for quantified Boolean formulas (QBF). The first type is a message passing based heuristics that can prove unsatisfiability of the QBF by assigning the universal variables in such a way that the remaining formula is unsatisfiable. In the second type, we use message passing to guide branching heuristics of a Davis-Putnam Logemann-Loveland (DPLL) complete solver. Numerical experiments show that on random QBFs our branching heuristics gives robust exponential efficiency gain with respect to the state-of-art solvers. We also manage to solve some previously unsolved benchmarks from the QBFLIB library. Apart from this our study sheds light on using message passing in small systems and as subroutines in complete solvers. △ Less

Submitted 12 February, 2012; originally announced February 2012.

Comments: 14 pages, 7 figures

Journal ref: J. Stat. Mech. (2012) P05025

arXiv:1110.5091 [pdf, other]

3D Protein Structure Predicted from Sequence

Authors: Debora S. Marks, Lucy J. Colwell, Robert Sheridan, Thomas A. Hopf, Andrea Pagnani, Riccardo Zecchina, Chris Sander

Abstract: The evolutionary trajectory of a protein through sequence space is constrained by function and three-dimensional (3D) structure. Residues in spatial proximity tend to co-evolve, yet attempts to invert the evolutionary record to identify these constraints and use them to computationally fold proteins have so far been unsuccessful. Here, we show that co-variation of residue pairs, observed in a larg… ▽ More The evolutionary trajectory of a protein through sequence space is constrained by function and three-dimensional (3D) structure. Residues in spatial proximity tend to co-evolve, yet attempts to invert the evolutionary record to identify these constraints and use them to computationally fold proteins have so far been unsuccessful. Here, we show that co-variation of residue pairs, observed in a large protein family, provides sufficient information to determine 3D protein structure. Using a data-constrained maximum entropy model of the multiple sequence alignment, we identify pairs of statistically coupled residue positions which are expected to be close in the protein fold, termed contacts inferred from evolutionary information (EICs). To assess the amount of information about the protein fold contained in these coupled pairs, we evaluate the accuracy of predicted 3D structures for proteins of 50-260 residues, from 15 diverse protein families, including a G-protein coupled receptor. These structure predictions are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The resulting low Cα-RMSD error range of 2.7-5.1Å, over at least 75% of the protein, indicates the potential for predicting essentially correct 3D structures for the thousands of protein families that have no known structure, provided they include a sufficiently large number of divergent sample sequences. With the current enormous growth in sequence information based on new sequencing technology, this opens the door to a comprehensive survey of protein 3D structures, including many not currently accessible to the experimental methods of structural genomics. This advance has potential applications in many biological contexts, such as synthetic biology, identification of functional sites in proteins and interpretation of the functional impact of genetic variants. △ Less

Submitted 25 October, 2011; v1 submitted 23 October, 2011; originally announced October 2011.

Comments: Debora S Marks and Lucy J Colwell are joint first authors. Supplement and Appendices at: http://cbio.mskcc.org/foldingproteins. Updated version 25-Oct-2011 with '3D' added to the title and corrections of details in the methods section to make it compatible with derivation of equations in the main text and in the supplement

arXiv:1108.6239 [pdf, ps, other]

doi 10.1103/PhysRevE.84.051111

Efficient data compression from statistical physics of codes over finite fields

Authors: Alfredo Braunstein, Farbod Kayhan, Riccardo Zecchina

Abstract: In this paper we discuss a novel data compression technique for binary symmetric sources based on the cavity method over a Galois Field of order q (GF(q)). We present a scheme of low complexity and near optimal empirical performance. The compression step is based on a reduction of sparse low density parity check codes over GF(q) and is done through the so called reinforced belief-propagation equat… ▽ More In this paper we discuss a novel data compression technique for binary symmetric sources based on the cavity method over a Galois Field of order q (GF(q)). We present a scheme of low complexity and near optimal empirical performance. The compression step is based on a reduction of sparse low density parity check codes over GF(q) and is done through the so called reinforced belief-propagation equations. These reduced codes appear to have a non-trivial geometrical modification of the space of codewords which makes such compression computationally feasible. The computational complexity is O(d.n.q.log(q)) per iteration, where d is the average degree of the check nodes and n is the number of bits. For our code ensemble, decompression can be done in a time linear in the code's length by a simple leaf-removal algorithm. △ Less

Submitted 13 October, 2011; v1 submitted 31 August, 2011; originally announced August 2011.

Comments: 10 pages, 4 figures

Journal ref: Phys. Rev. E 84, 051111 (2011)

arXiv:1108.6160 [pdf, other]

doi 10.1088/1742-5468/2011/11/P11009

Stochastic optimization by message passing

Authors: Fabrizio Altarelli, Alfredo Braunstein, Abolfazl Ramezanpour, Riccardo Zecchina

Abstract: Most optimization problems in applied sciences realistically involve uncertainty in the parameters defining the cost function, of which only statistical information is known beforehand. In a recent work we introduced a message passing algorithm based on the cavity method of statistical physics to solve the two-stage matching problem with independently distributed stochastic parameters. In this pap… ▽ More Most optimization problems in applied sciences realistically involve uncertainty in the parameters defining the cost function, of which only statistical information is known beforehand. In a recent work we introduced a message passing algorithm based on the cavity method of statistical physics to solve the two-stage matching problem with independently distributed stochastic parameters. In this paper we provide an in-depth explanation of the general method and caveats, show the details of the derivation and resulting algorithm for the matching problem and apply it to a stochastic version of the independent set problem, which is a computationally hard and relevant problem in communication networks. We compare the results with some greedy algorithms and briefly discuss the extension to more complicated stochastic multi-stage problems. △ Less

Submitted 31 August, 2011; originally announced August 2011.

Comments: 31 pages, 8 figures

Journal ref: J. Stat. Mech. (2011) P11009

arXiv:1104.1929 [pdf, ps, other]

doi 10.1140/epjb/e2011-10963-x

Statistical physics approach to graphical games: local and global interactions

Authors: A. Ramezanpour, J. Realpe-Gomez, R. Zecchina

Abstract: In a graphical game agents play with their neighbors on a graph to achieve an appropriate state of equilibrium. Here relevant problems are characterizing the equilibrium set and discovering efficient algorithms to find such an equilibrium (solution). We consider a representation of games that extends over graphical games to deal conveniently with both local a global interactions and use the cavity… ▽ More In a graphical game agents play with their neighbors on a graph to achieve an appropriate state of equilibrium. Here relevant problems are characterizing the equilibrium set and discovering efficient algorithms to find such an equilibrium (solution). We consider a representation of games that extends over graphical games to deal conveniently with both local a global interactions and use the cavity method of statistical physics to study the geometrical structure of the equilibria space. The method also provides a distributive and local algorithm to find an equilibrium. For simplicity we consider only pure Nash equilibria but the methods can as well be extended to deal with (approximated) mixed Nash equilirbia. △ Less

Submitted 11 April, 2011; originally announced April 2011.

Comments: 32 pages, 11 figures, to be published in Eur. Phys. J. B

Journal ref: Eur. Phys. J. B 81, 327 (2011)

arXiv:1101.4573 [pdf, other]

doi 10.1073/pnas.1004751108

Finding undetected protein associations in cell signaling by belief propagation

Authors: M. Bailly-Bechet, C. Borgs, A. Braunstein, J. Chayes, A. Dagkessamanskaia, J. -M. François, R. Zecchina

Abstract: External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-prote… ▽ More External information propagates in the cell mainly through signaling cascades and transcriptional activation, allowing it to react to a wide spectrum of environmental changes. High throughput experiments identify numerous molecular components of such cascades that may, however, interact through unknown partners. Some of them may be detected using data coming from the integration of a protein-protein interaction network and mRNA expression profiles. This inference problem can be mapped onto the problem of finding appropriate optimal connected subgraphs of a network defined by these datasets. The optimization procedure turns out to be computationally intractable in general. Here we present a new distributed algorithm for this task, inspired from statistical physics, and apply this scheme to alpha factor and drug perturbations data in yeast. We identify the role of the COS8 protein, a member of a gene family of previously unknown function, and validate the results by genetic experiments. The algorithm we present is specially suited for very large datasets, can run in parallel, and can be adapted to other problems in systems biology. On renowned benchmarks it outperforms other algorithms in the field. △ Less

Submitted 24 January, 2011; originally announced January 2011.

Comments: 6 pages, 3 figures, 1 table, Supporting Information

Journal ref: Published online before print December 27, 2010, doi: 10.1073/pnas.1004751108 PNAS January 11, 2011 vol. 108 no. 2 882-887

arXiv:1003.6124

Statistical physics of optimization under uncertainty

Authors: Fabrizio Altarelli, Alfredo Braunstein, Abolfazl Ramezanpour, Riccardo Zecchina

Abstract: Optimization under uncertainty deals with the problem of optimizing stochastic cost functions given some partial information on their inputs. These problems are extremely difficult to solve and yet pervade all areas of technological and natural sciences. We propose a general approach to solve such large-scale stochastic optimization problems and a Survey Propagation based algorithm that implements… ▽ More Optimization under uncertainty deals with the problem of optimizing stochastic cost functions given some partial information on their inputs. These problems are extremely difficult to solve and yet pervade all areas of technological and natural sciences. We propose a general approach to solve such large-scale stochastic optimization problems and a Survey Propagation based algorithm that implements it. In the problems we consider some of the parameters are not known at the time of the first optimization, but are extracted later independently of each other from known distributions. As an illustration, we apply our method to the stochastic bipartite matching problem, in the two-stage and multi-stage cases. The efficiency of our approach, which does not rely on sampling techniques, allows us to validate the analytical predictions with large-scale numerical simulations. △ Less

Submitted 1 September, 2011; v1 submitted 31 March, 2010; originally announced March 2010.

Comments: This article has been withdrawn because it was replaced by arXiv:1105.3657 with a different name

arXiv:0910.0767 [pdf, ps, other]

doi 10.1088/1742-5468/2009/12/P12010

Clustering with shallow trees

Authors: M. Bailly-Bechet, S. Bradde, A. Braunstein, A. Flaxman, L. Foini, R. Zecchina

Abstract: We propose a new method for hierarchical clustering based on the optimisation of a cost function over trees of limited depth, and we derive a message--passing method that allows to solve it efficiently. The method and algorithm can be interpreted as a natural interpolation between two well-known approaches, namely single linkage and the recently presented Affinity Propagation. We analyze with th… ▽ More We propose a new method for hierarchical clustering based on the optimisation of a cost function over trees of limited depth, and we derive a message--passing method that allows to solve it efficiently. The method and algorithm can be interpreted as a natural interpolation between two well-known approaches, namely single linkage and the recently presented Affinity Propagation. We analyze with this general scheme three biological/medical structured datasets (human population based on genetic information, proteins based on sequences and verbal autopsies) and show that the interpolation technique provides new insight. △ Less

Submitted 23 November, 2009; v1 submitted 5 October, 2009; originally announced October 2009.

Comments: 11 pages, 7 figures

Journal ref: J. Stat. Mech. (2009) P12010

arXiv:0905.1893 [pdf, ps, other]

doi 10.1209/0295-5075/89/37009

Aligning graphs and finding substructures by a cavity approach

Authors: S. Bradde, A. Braunstein, H. Mahmoudi, F. Tria, M. Weigt, R. Zecchina

Abstract: We introduce a new distributed algorithm for aligning graphs or finding substructures within a given graph. It is based on the cavity method and is used to study the maximum-clique and the graph-alignment problems in random graphs. The algorithm allows to analyze large graphs and may find applications in fields such as computational biology. As a proof of concept we use our algorithm to align the… ▽ More We introduce a new distributed algorithm for aligning graphs or finding substructures within a given graph. It is based on the cavity method and is used to study the maximum-clique and the graph-alignment problems in random graphs. The algorithm allows to analyze large graphs and may find applications in fields such as computational biology. As a proof of concept we use our algorithm to align the similarity graphs of two interacting protein families involved in bacterial signal transduction, and to predict actually interacting protein partners between these families. △ Less

Submitted 1 April, 2010; v1 submitted 12 May, 2009; originally announced May 2009.

Comments: 5 pages, 4 figures

Journal ref: 2010 Europhys. Lett. 89 37009

arXiv:0903.2429 [pdf, ps, other]

doi 10.1088/1742-5468/2009/07/P07002

Statistical mechanics of budget-constrained auctions

Authors: F. Altarelli, A. Braunstein, J. Realpe-Gomez, R. Zecchina

Abstract: Finding the optimal assignment in budget-constrained auctions is a combinatorial optimization problem with many important applications, a notable example being the sale of advertisement space by search engines (in this context the problem is often referred to as the off-line AdWords problem). Based on the cavity method of statistical mechanics, we introduce a message passing algorithm that is ca… ▽ More Finding the optimal assignment in budget-constrained auctions is a combinatorial optimization problem with many important applications, a notable example being the sale of advertisement space by search engines (in this context the problem is often referred to as the off-line AdWords problem). Based on the cavity method of statistical mechanics, we introduce a message passing algorithm that is capable of solving efficiently random instances of the problem extracted from a natural distribution, and we derive from its properties the phase diagram of the problem. As the control parameter (average value of the budgets) is varied, we find two phase transitions delimiting a region in which long-range correlations arise. △ Less

Submitted 27 April, 2009; v1 submitted 13 March, 2009; originally announced March 2009.

Comments: Minor revision

Journal ref: JSTAT 2009;2009:P07002 (27pp)

arXiv:0901.4467 [pdf, ps, other]

doi 10.1109/ISIT.2009.5205707

Efficient LDPC Codes over GF(q) for Lossy Data Compression

Authors: Alfredo Braunstein, Farbod Kayhan, Riccardo Zecchina

Abstract: In this paper we consider the lossy compression of a binary symmetric source. We present a scheme that provides a low complexity lossy compressor with near optimal empirical performance. The proposed scheme is based on b-reduced ultra-sparse LDPC codes over GF(q). Encoding is performed by the Reinforced Belief Propagation algorithm, a variant of Belief Propagation. The computational complexity at… ▽ More In this paper we consider the lossy compression of a binary symmetric source. We present a scheme that provides a low complexity lossy compressor with near optimal empirical performance. The proposed scheme is based on b-reduced ultra-sparse LDPC codes over GF(q). Encoding is performed by the Reinforced Belief Propagation algorithm, a variant of Belief Propagation. The computational complexity at the encoder is O(<d>.n.q.log q), where <d> is the average degree of the check nodes. For our code ensemble, decoding can be performed iteratively following the inverse steps of the leaf removal algorithm. For a sparse parity-check matrix the number of needed operations is O(n). △ Less

Submitted 7 October, 2011; v1 submitted 28 January, 2009; originally announced January 2009.

Comments: 5 pages, 3 figures

Journal ref: In: IEEE International Symposium on Information Theory, 2009. ISIT 2009. Seul, Korea; 2009

arXiv:0901.1684 [pdf, ps, other]

doi 10.1063/1.2982805

A rigorous analysis of the cavity equations for the minimum spanning tree

Authors: M. Bayati, A. Braunstein, R. Zecchina

Abstract: We analyze a new general representation for the Minimum Weight Steiner Tree (MST) problem which translates the topological connectivity constraint into a set of local conditions which can be analyzed by the so called cavity equations techniques. For the limit case of the Spanning tree we prove that the fixed point of the algorithm arising from the cavity equations leads to the global optimum. We analyze a new general representation for the Minimum Weight Steiner Tree (MST) problem which translates the topological connectivity constraint into a set of local conditions which can be analyzed by the so called cavity equations techniques. For the limit case of the Spanning tree we prove that the fixed point of the algorithm arising from the cavity equations leads to the global optimum. △ Less

Submitted 12 January, 2009; originally announced January 2009.

Comments: 5 pages, 1 figure

Journal ref: J. Math. Phys. 49, 125206 (2008)

arXiv:0801.2890 [pdf, ps, other]

doi 10.1103/PhysRevE.77.031118

Entropy landscape and non-Gibbs solutions in constraint satisfaction problems

Authors: L. Dall'Asta, A. Ramezanpour, R. Zecchina

Abstract: We study the entropy landscape of solutions for the bicoloring problem in random graphs, a representative difficult constraint satisfaction problem. Our goal is to classify which type of clusters of solutions are addressed by different algorithms. In the first part of the study we use the cavity method to obtain the number of clusters with a given internal entropy and determine the phase diagram… ▽ More We study the entropy landscape of solutions for the bicoloring problem in random graphs, a representative difficult constraint satisfaction problem. Our goal is to classify which type of clusters of solutions are addressed by different algorithms. In the first part of the study we use the cavity method to obtain the number of clusters with a given internal entropy and determine the phase diagram of the problem, e.g. dynamical, rigidity and SAT-UNSAT transitions. In the second part of the paper we analyze different algorithms and locate their behavior in the entropy landscape of the problem. For instance we show that a smoothed version of a decimation strategy based on Belief Propagation is able to find solutions belonging to sub-dominant clusters even beyond the so called rigidity transition where the thermodynamically relevant clusters become frozen. These non-equilibrium solutions belong to the most probable unfrozen clusters. △ Less

Submitted 18 January, 2008; originally announced January 2008.

Comments: 38 pages, 10 figures

Journal ref: Phys. Rev. E 77, 031118 (2008)

arXiv:0709.1190 [pdf, ps, other]

doi 10.1137/090753115

Belief-Propagation for Weighted b-Matchings on Arbitrary Graphs and its Relation to Linear Programs with Integer Solutions

Authors: Mohsen Bayati, Christian Borgs, Jennifer Chayes, Riccardo Zecchina

Abstract: We consider the general problem of finding the minimum weight $\bm$-matching on arbitrary graphs. We prove that, whenever the linear programming (LP) relaxation of the problem has no fractional solutions, then the belief propagation (BP) algorithm converges to the correct solution. We also show that when the LP relaxation has a fractional solution then the BP algorithm can be used to solve the LP… ▽ More We consider the general problem of finding the minimum weight $\bm$-matching on arbitrary graphs. We prove that, whenever the linear programming (LP) relaxation of the problem has no fractional solutions, then the belief propagation (BP) algorithm converges to the correct solution. We also show that when the LP relaxation has a fractional solution then the BP algorithm can be used to solve the LP relaxation. Our proof is based on the notion of graph covers and extends the analysis of (Bayati-Shah-Sharma 2005 and Huang-Jebara 2007}. These results are notable in the following regards: (1) It is one of a very small number of proofs showing correctness of BP without any constraint on the graph structure. (2) Variants of the proof work for both synchronous and asynchronous BP; it is the first proof of convergence and correctness of an asynchronous BP algorithm for a combinatorial optimization problem. △ Less

Submitted 4 August, 2011; v1 submitted 8 September, 2007; originally announced September 2007.

Comments: 28 pages, 2 figures. Submitted to SIAM journal on Discrete Mathematics on March 19, 2009; accepted for publication (in revised form) August 30, 2010; published electronically July 1, 2011

Journal ref: SIAM J. Discrete Math. 2011, Vol 25, Issue 2, pp. 989-1011

arXiv:0707.1295 [pdf, ps, other]

doi 10.1073/pnas.0700324104

Efficient supervised learning in networks with binary synapses

Authors: Carlo Baldassi, Alfredo Braunstein, Nicolas Brunel, Riccardo Zecchina

Abstract: Recent experimental studies indicate that synaptic changes induced by neuronal activity are discrete jumps between a small number of stable states. Learning in systems with discrete synapses is known to be a computationally hard problem. Here, we study a neurobiologically plausible on-line learning algorithm that derives from Belief Propagation algorithms. We show that it performs remarkably wel… ▽ More Recent experimental studies indicate that synaptic changes induced by neuronal activity are discrete jumps between a small number of stable states. Learning in systems with discrete synapses is known to be a computationally hard problem. Here, we study a neurobiologically plausible on-line learning algorithm that derives from Belief Propagation algorithms. We show that it performs remarkably well in a model neuron with binary synapses, and a finite number of `hidden' states per synapse, that has to learn a random classification task. Such system is able to learn a number of associations close to the theoretical limit, in time which is sublinear in system size. This is to our knowledge the first on-line algorithm that is able to achieve efficiently a finite number of patterns learned per binary synapse. Furthermore, we show that performance is optimal for a finite number of hidden states which becomes very small for sparse coding. The algorithm is similar to the standard `perceptron' learning algorithm, with an additional rule for synaptic transitions which occur only if a currently presented pattern is `barely correct'. In this case, the synaptic changes are meta-plastic only (change in hidden states and not in actual synaptic state), stabilizing the synapse in its current state. Finally, we show that a system with two visible states and K hidden states is much more robust to noise than a system with K visible states. We suggest this rule is sufficiently simple to be easily implemented by neurobiological systems or in hardware. △ Less

Submitted 9 July, 2007; originally announced July 2007.

Comments: 10 pages, 4 figures

Journal ref: PNAS 104, 11079-11084 (2007)

arXiv:0705.0423 [pdf, ps, other]

doi 10.1109/ISIT.2007.4557497

Encoding for the Blackwell Channel with Reinforced Belief Propagation

Authors: A. Braunstein, F. Kayhan, G. Montorsi, R. Zecchina

Abstract: A key idea in coding for the broadcast channel (BC) is binning, in which the transmitter encode information by selecting a codeword from an appropriate bin (the messages are thus the bin indexes). This selection is normally done by solving an appropriate (possibly difficult) combinatorial problem. Recently it has been shown that binning for the Blackwell channel --a particular BC-- can be done b… ▽ More A key idea in coding for the broadcast channel (BC) is binning, in which the transmitter encode information by selecting a codeword from an appropriate bin (the messages are thus the bin indexes). This selection is normally done by solving an appropriate (possibly difficult) combinatorial problem. Recently it has been shown that binning for the Blackwell channel --a particular BC-- can be done by iterative schemes based on Survey Propagation (SP). This method uses decimation for SP and suffers a complexity of O(n^2). In this paper we propose a new variation of the Belief Propagation (BP) algorithm, named Reinforced BP algorithm, that turns BP into a solver. Our simulations show that this new algorithm has complexity O(n log n). Using this new algorithm together with a non-linear coding scheme, we can efficiently achieve rates close to the border of the capacity region of the Blackwell channel. △ Less

Submitted 3 May, 2007; originally announced May 2007.

Comments: 5 pages, 8 figures, submitted to ISIT 2007

Journal ref: IEEE International Symposium on Information Theory (ISIT07); 2007. p. 1891-5

arXiv:cond-mat/0511159 [pdf, ps, other]

doi 10.1103/PhysRevLett.96.030201

Learning by message-passing in networks of discrete synapses

Authors: Alfredo Braunstein, Riccardo Zecchina

Abstract: We show that a message-passing process allows to store in binary "material" synapses a number of random patterns which almost saturates the information theoretic bounds. We apply the learning algorithm to networks characterized by a wide range of different connection topologies and of size comparable with that of biological systems (e.g. $n\simeq10^{5}-10^{6}$). The algorithm can be turned into… ▽ More We show that a message-passing process allows to store in binary "material" synapses a number of random patterns which almost saturates the information theoretic bounds. We apply the learning algorithm to networks characterized by a wide range of different connection topologies and of size comparable with that of biological systems (e.g. $n\simeq10^{5}-10^{6}$). The algorithm can be turned into an on-line --fault tolerant-- learning protocol of potential interest in modeling aspects of synaptic plasticity and in building neuromorphic devices. △ Less

Submitted 9 December, 2005; v1 submitted 7 November, 2005; originally announced November 2005.

Comments: 4 pages, 3 figures; references updated and minor corrections; accepted in PRL

Journal ref: Phys. Rev. Lett. 96, 030201 (2006)

arXiv:cond-mat/0506053 [pdf, ps, other]

doi 10.1016/j.tcs.2008.01.005

Pairs of SAT Assignment in Random Boolean Formulae

Authors: Hervé Daudé, Marc Mezard, Thierry Mora, Riccardo Zecchina

Abstract: We investigate geometrical properties of the random K-satisfiability problem using the notion of x-satisfiability: a formula is x-satisfiable if there exist two SAT assignments differing in Nx variables. We show the existence of a sharp threshold for this property as a function of the clause density. For large enough K, we prove that there exists a region of clause density, below the satisfiabil… ▽ More We investigate geometrical properties of the random K-satisfiability problem using the notion of x-satisfiability: a formula is x-satisfiable if there exist two SAT assignments differing in Nx variables. We show the existence of a sharp threshold for this property as a function of the clause density. For large enough K, we prove that there exists a region of clause density, below the satisfiability threshold, where the landscape of Hamming distances between SAT assignments experiences a gap: pairs of SAT-assignments exist at small x, and around x=1/2, but they donot exist at intermediate values of x. This result is consistent with the clustering scenario which is at the heart of the recent heuristic analysis of satisfiability using statistical physics analysis (the cavity method), and its algorithmic counterpart (the survey propagation algorithm). The method uses elementary probabilistic arguments (first and second moment methods), and might be useful in other problems of computational and physical interest where similar phenomena appear. △ Less

Submitted 19 September, 2007; v1 submitted 2 June, 2005; originally announced June 2005.

Journal ref: Theoretical Computer Science 393 (2008) 260-279

arXiv:cond-mat/0504070 [pdf, ps, other]

doi 10.1103/PhysRevLett.94.197205

Clustering of solutions in the random satisfiability problem

Authors: M. Mezard, T. Mora, R. Zecchina

Abstract: Using elementary rigorous methods we prove the existence of a clustered phase in the random $K$-SAT problem, for $K\geq 8$. In this phase the solutions are grouped into clusters which are far away from each other. The results are in agreement with previous predictions of the cavity method and give a rigorous confirmation to one of its main building blocks. It can be generalized to other systems… ▽ More Using elementary rigorous methods we prove the existence of a clustered phase in the random $K$-SAT problem, for $K\geq 8$. In this phase the solutions are grouped into clusters which are far away from each other. The results are in agreement with previous predictions of the cavity method and give a rigorous confirmation to one of its main building blocks. It can be generalized to other systems of both physical and computational interest. △ Less

Submitted 4 April, 2005; originally announced April 2005.

Comments: 4 pages, 1 figure

Journal ref: Phys. Rev. Lett. 94, 197205 (2005)

arXiv:cond-mat/0312483 [pdf, ps, other]

doi 10.1088/1742-5468/2004/06/P06007

Survey Propagation as local equilibrium equations

Authors: A. Braunstein, R. Zecchina

Abstract: It has been shown experimentally that a decimation algorithm based on Survey Propagation (SP) equations allows to solve efficiently some combinatorial problems over random graphs. We show that these equations can be derived as sum-product equations for the computation of marginals in an extended space where the variables are allowed to take an additional value -- $*$ -- when they are not forced… ▽ More It has been shown experimentally that a decimation algorithm based on Survey Propagation (SP) equations allows to solve efficiently some combinatorial problems over random graphs. We show that these equations can be derived as sum-product equations for the computation of marginals in an extended space where the variables are allowed to take an additional value -- $*$ -- when they are not forced by the combinatorial constraints. An appropriate ``local equilibrium condition'' cost/energy function is introduced and its entropy is shown to coincide with the expected logarithm of the number of clusters of solutions as computed by SP. These results may help to clarify the geometrical notion of clusters assumed by SP for the random K-SAT or random graph coloring (where it is conjectured to be exact) and helps to explain which kind of clustering operation or approximation is enforced in general/small sized models in which it is known to be inexact. △ Less

Submitted 8 June, 2004; v1 submitted 18 December, 2003; originally announced December 2003.

Comments: 13 pages, 3 figures

Journal ref: J. Stat. Mech., P06007 (2004)

arXiv:cs/0309020 [pdf]

Threshold values of Random K-SAT from the cavity method

Authors: Stephan Mertens, Marc Mezard, Riccardo Zecchina

Abstract: Using the cavity equations of \cite{mezard:parisi:zecchina:02,mezard:zecchina:02}, we derive the various threshold values for the number of clauses per variable of the random $K$-satisfiability problem, generalizing the previous results to $K \ge 4$. We also give an analytic solution of the equations, and some closed expressions for these thresholds, in an expansion around large $K$. The stabili… ▽ More Using the cavity equations of \cite{mezard:parisi:zecchina:02,mezard:zecchina:02}, we derive the various threshold values for the number of clauses per variable of the random $K$-satisfiability problem, generalizing the previous results to $K \ge 4$. We also give an analytic solution of the equations, and some closed expressions for these thresholds, in an expansion around large $K$. The stability of the solution is also computed. For any $K$, the satisfiability threshold is found to be in the stable region of the solution, which adds further credit to the conjecture that this computation gives the exact satisfiability threshold. △ Less

Submitted 24 February, 2005; v1 submitted 12 September, 2003; originally announced September 2003.

Comments: 38 pages; extended explanations and derivations; this version is going to appear in Random Structures & Algorithms

ACM Class: F.2.0; G.2.0

arXiv:cs/0212002 [pdf, ps, other]

Survey propagation: an algorithm for satisfiability

Authors: A. Braunstein, M. Mezard, R. Zecchina

Abstract: We study the satisfiability of randomly generated formulas formed by $M$ clauses of exactly $K$ literals over $N$ Boolean variables. For a given value of $N$ the problem is known to be most difficult with $α=M/N$ close to the experimental threshold $α_c$ separating the region where almost all formulas are SAT from the region where all formulas are UNSAT. Recent results from a statistical physics… ▽ More We study the satisfiability of randomly generated formulas formed by $M$ clauses of exactly $K$ literals over $N$ Boolean variables. For a given value of $N$ the problem is known to be most difficult with $α=M/N$ close to the experimental threshold $α_c$ separating the region where almost all formulas are SAT from the region where all formulas are UNSAT. Recent results from a statistical physics analysis suggest that the difficulty is related to the existence of a clustering phenomenon of the solutions when $α$ is close to (but smaller than) $α_c$. We introduce a new type of message passing algorithm which allows to find efficiently a satisfiable assignment of the variables in the difficult region. This algorithm is iterative and composed of two main parts. The first is a message-passing procedure which generalizes the usual methods like Sum-Product or Belief Propagation: it passes messages that are surveys over clusters of the ordinary messages. The second part uses the detailed probabilistic information obtained from the surveys in order to fix variables and simplify the problem. Eventually, the simplified problem that remains is solved by a conventional heuristic. △ Less

Submitted 4 April, 2006; v1 submitted 4 December, 2002; originally announced December 2002.

Comments: 19 pages, 6 figure

ACM Class: G.3

Journal ref: Random Structures and Algorithms 27, 201-226 (2005)

arXiv:cond-mat/0212451 [pdf, ps, other]

Constraint Satisfaction by Survey Propagation

Authors: A. Braunstein, M. Mezard, M. Weigt, R. Zecchina

Abstract: Survey Propagation is an algorithm designed for solving typical instances of random constraint satisfiability problems. It has been successfully tested on random 3-SAT and random $G(n,\frac{c}{n})$ graph 3-coloring, in the hard region of the parameter space. Here we provide a generic formalism which applies to a wide class of discrete Constraint Satisfaction Problems. Survey Propagation is an algorithm designed for solving typical instances of random constraint satisfiability problems. It has been successfully tested on random 3-SAT and random $G(n,\frac{c}{n})$ graph 3-coloring, in the hard region of the parameter space. Here we provide a generic formalism which applies to a wide class of discrete Constraint Satisfaction Problems. △ Less

Submitted 27 September, 2003; v1 submitted 18 December, 2002; originally announced December 2002.

Comments: 8 pages, 5 figures

Journal ref: Advances in Neural Information Processing Systems. Vol 9. Oxford University Press; 2005. 424

arXiv:cond-mat/0208460 [pdf, ps, other]

doi 10.1103/PhysRevLett.89.268701

Coloring random graphs

Authors: R. Mulet, A. Pagnani, M. Weigt, R. Zecchina

Abstract: We study the graph coloring problem over random graphs of finite average connectivity $c$. Given a number $q$ of available colors, we find that graphs with low connectivity admit almost always a proper coloring whereas graphs with high connectivity are uncolorable. Depending on $q$, we find the precise value of the critical average connectivity $c_q$. Moreover, we show that below $c_q$ there exi… ▽ More We study the graph coloring problem over random graphs of finite average connectivity $c$. Given a number $q$ of available colors, we find that graphs with low connectivity admit almost always a proper coloring whereas graphs with high connectivity are uncolorable. Depending on $q$, we find the precise value of the critical average connectivity $c_q$. Moreover, we show that below $c_q$ there exist a clustering phase $c\in [c_d,c_q]$ in which ground states spontaneously divide into an exponential number of clusters and where the proliferation of metastable states is responsible for the onset of complexity in local search algorithms. △ Less

Submitted 28 October, 2002; v1 submitted 23 August, 2002; originally announced August 2002.

Comments: 4 pages, 1 figure, version to app. in PRL

Journal ref: Phys. Rev. Lett. 89, 268701 (2002)

arXiv:cond-mat/0207140 [pdf, ps, other]

Alternative solutions to diluted p-spin models and XORSAT problems

Authors: M. Mezard, F. Ricci-Tersenghi, R. Zecchina

Abstract: We derive analytical solutions for p-spin models with finite connectivity at zero temperature. These models are the statistical mechanics equivalent of p-XORSAT problems in theoretical computer science. We give a full characterization of the phase diagram: location of the phase transitions (static and dynamic), together with a description of the clustering phenomenon taking place in configuratio… ▽ More We derive analytical solutions for p-spin models with finite connectivity at zero temperature. These models are the statistical mechanics equivalent of p-XORSAT problems in theoretical computer science. We give a full characterization of the phase diagram: location of the phase transitions (static and dynamic), together with a description of the clustering phenomenon taking place in configurational space. We use two alternative methods: the cavity approach and a rigorous derivation. △ Less

Submitted 19 September, 2002; v1 submitted 4 July, 2002; originally announced July 2002.

Comments: 14 pages, 14 figures. v3: small errors corrected, simpler notation used

Journal ref: J. Stat. Phys. 111, 505 (2003)

arXiv:cond-mat/0112142 [pdf, ps, other]

doi 10.1103/PhysRevLett.88.178701

Boosting search by rare events

Authors: Andrea Montanari, Riccardo Zecchina

Abstract: Randomized search algorithms for hard combinatorial problems exhibit a large variability of performances. We study the different types of rare events which occur in such out-of-equilibrium stochastic processes and we show how they cooperate in determining the final distribution of running times. As a byproduct of our analysis we show how search algorithms are optimized by random restarts. Randomized search algorithms for hard combinatorial problems exhibit a large variability of performances. We study the different types of rare events which occur in such out-of-equilibrium stochastic processes and we show how they cooperate in determining the final distribution of running times. As a byproduct of our analysis we show how search algorithms are optimized by random restarts. △ Less

Submitted 19 December, 2001; v1 submitted 8 December, 2001; originally announced December 2001.

Comments: 4 pages, 3 eps figures. References updated

Journal ref: Phys. Rev. Lett. 88, 178701 (2002)

arXiv:cond-mat/0111153 [pdf, ps, other]

doi 10.1103/PhysRevLett.88.188701

Hiding solutions in random satisfiability problems: A statistical mechanics approach

Authors: W. Barthel, A. K. Hartmann, M. Leone, F. Ricci-Tersenghi, M. Weigt, R. Zecchina

Abstract: A major problem in evaluating stochastic local search algorithms for NP-complete problems is the need for a systematic generation of hard test instances having previously known properties of the optimal solutions. On the basis of statistical mechanics results, we propose random generators of hard and satisfiable instances for the 3-satisfiability problem (3SAT). The design of the hardest problem… ▽ More A major problem in evaluating stochastic local search algorithms for NP-complete problems is the need for a systematic generation of hard test instances having previously known properties of the optimal solutions. On the basis of statistical mechanics results, we propose random generators of hard and satisfiable instances for the 3-satisfiability problem (3SAT). The design of the hardest problem instances is based on the existence of a first order ferromagnetic phase transition and the glassy nature of excited states. The analytical predictions are corroborated by numerical results obtained from complete as well as stochastic local algorithms. △ Less

Submitted 27 March, 2002; v1 submitted 9 November, 2001; originally announced November 2001.

Comments: 5 pages, 4 figures, revised version to app. in PRL

Journal ref: Phys. Rev. Lett. 88, 188701 (2002)

arXiv:cond-mat/0103328 [pdf, ps, other]

doi 10.1103/PhysRevLett.87.127209

Exact solutions for diluted spin glasses and optimization problems

Authors: S. Franz, M. Leone, F. Ricci-Tersenghi, R. Zecchina

Abstract: We study the low temperature properties of p-spin glass models with finite connectivity and of some optimization problems. Using a one-step functional replica symmetry breaking Ansatz we can solve exactly the saddle-point equations for graphs with uniform connectivity. The resulting ground state energy is in perfect agreement with numerical simulations. For fluctuating connectivity graphs, the s… ▽ More We study the low temperature properties of p-spin glass models with finite connectivity and of some optimization problems. Using a one-step functional replica symmetry breaking Ansatz we can solve exactly the saddle-point equations for graphs with uniform connectivity. The resulting ground state energy is in perfect agreement with numerical simulations. For fluctuating connectivity graphs, the same Ansatz can be used in a variational way: For p-spin models (known as p-XOR-SAT in computer science) it provides the exact configurational entropy together with the dynamical and static critical connectivities (for p=3, γ_d=0.818 and γ_s=0.918 resp.), whereas for hard optimization problems like 3-SAT or Bicoloring it provides new upper bounds for their critical thresholds (γ_c^{var}=4.396 and γ_c^{var}=2.149 resp.). △ Less

Submitted 15 August, 2001; v1 submitted 15 March, 2001; originally announced March 2001.

Comments: 4 pages, 1 figure, accepted for publication in PRL

Journal ref: Phys. Rev. Lett. 87 (2001) 127209

arXiv:cond-mat/0011181 [pdf, ps, other]

doi 10.1103/PhysRevE.63.026702

Simplest random K-satisfiability problem

Authors: F. Ricci-Tersenghi, M. Weigt, R. Zecchina

Abstract: We study a simple and exactly solvable model for the generation of random satisfiability problems. These consist of $γN$ random boolean constraints which are to be satisfied simultaneously by $N$ logical variables. In statistical-mechanics language, the considered model can be seen as a diluted p-spin model at zero temperature. While such problems become extraordinarily hard to solve by local se… ▽ More We study a simple and exactly solvable model for the generation of random satisfiability problems. These consist of $γN$ random boolean constraints which are to be satisfied simultaneously by $N$ logical variables. In statistical-mechanics language, the considered model can be seen as a diluted p-spin model at zero temperature. While such problems become extraordinarily hard to solve by local search methods in a large region of the parameter space, still at least one solution may be superimposed by construction. The statistical properties of the model can be studied exactly by the replica method and each single instance can be analyzed in polynomial time by a simple global solution method. The geometrical/topological structures responsible for dynamic and static phase transitions as well as for the onset of computational complexity in local search method are thoroughly analyzed. Numerical analysis on very large samples allows for a precise characterization of the critical scaling behaviour. △ Less

Submitted 21 December, 2000; v1 submitted 10 November, 2000; originally announced November 2000.

Comments: 14 pages, 5 figures, to appear in Phys. Rev. E (Feb 2001). v2: minor errors and references corrected

Journal ref: Phys. Rev. E 63, 026702 (2001)

Showing 1–50 of 50 results for author: Zecchina, R