-
Sparse generative modeling via parameter-reduction of Boltzmann machines: application to protein-sequence families
Authors:
Pierre Barrat-Charlaix,
Anna Paola Muntoni,
Kai Shimagaki,
Martin Weigt,
Francesco Zamponi
Abstract:
Boltzmann machines (BM) are widely used as generative models. For example, pairwise Potts models (PM), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino-acid conservation, and the two-site couplings, which mirror the coevolution betwe…
▽ More
Boltzmann machines (BM) are widely used as generative models. For example, pairwise Potts models (PM), which are instances of the BM class, provide accurate statistical models of families of evolutionarily related protein sequences. Their parameters are the local fields, which describe site-specific patterns of amino-acid conservation, and the two-site couplings, which mirror the coevolution between pairs of sites. This coevolution reflects structural and functional constraints acting on protein sequences during evolution. The most conservative choice to describe the coevolution signal is to include all possible two-site couplings into the PM. This choice, typical of what is known as Direct Coupling Analysis, has been successful for predicting residue contacts in the three-dimensional structure, mutational effects, and in generating new functional sequences. However, the resulting PM suffers from important over-fitting effects: many couplings are small, noisy and hardly interpretable; the PM is close to a critical point, meaning that it is highly sensitive to small parameter perturbations. In this work, we introduce a general parameter-reduction procedure for BMs, via a controlled iterative decimation of the less statistically significant couplings, identified by an information-based criterion that selects either weak or statistically unsupported couplings. For several protein families, our procedure allows one to remove more than $90\%$ of the PM couplings, while preserving the predictive and generative properties of the original dense PM, and the resulting model is far away from criticality, hence more robust to noise.
△ Less
Submitted 30 July, 2021; v1 submitted 23 November, 2020;
originally announced November 2020.
-
Aligning graphs and finding substructures by a cavity approach
Authors:
S. Bradde,
A. Braunstein,
H. Mahmoudi,
F. Tria,
M. Weigt,
R. Zecchina
Abstract:
We introduce a new distributed algorithm for aligning graphs or finding substructures within a given graph. It is based on the cavity method and is used to study the maximum-clique and the graph-alignment problems in random graphs. The algorithm allows to analyze large graphs and may find applications in fields such as computational biology. As a proof of concept we use our algorithm to align the…
▽ More
We introduce a new distributed algorithm for aligning graphs or finding substructures within a given graph. It is based on the cavity method and is used to study the maximum-clique and the graph-alignment problems in random graphs. The algorithm allows to analyze large graphs and may find applications in fields such as computational biology. As a proof of concept we use our algorithm to align the similarity graphs of two interacting protein families involved in bacterial signal transduction, and to predict actually interacting protein partners between these families.
△ Less
Submitted 1 April, 2010; v1 submitted 12 May, 2009;
originally announced May 2009.
-
Message passing for vertex covers
Authors:
Martin Weigt,
Haijun Zhou
Abstract:
Constructing a minimal vertex cover of a graph can be seen as a prototype for a combinatorial optimization problem under hard constraints. In this paper, we develop and analyze message passing techniques, namely warning and survey propagation, which serve as efficient heuristic algorithms for solving these computational hard problems. We show also, how previously obtained results on the typical-…
▽ More
Constructing a minimal vertex cover of a graph can be seen as a prototype for a combinatorial optimization problem under hard constraints. In this paper, we develop and analyze message passing techniques, namely warning and survey propagation, which serve as efficient heuristic algorithms for solving these computational hard problems. We show also, how previously obtained results on the typical-case behavior of vertex covers of random graphs can be recovered starting from the message passing equations, and how they can be extended.
△ Less
Submitted 8 September, 2006; v1 submitted 8 May, 2006;
originally announced May 2006.
-
Threshold values, stability analysis and high-q asymptotics for the coloring problem on random graphs
Authors:
Florent Krzakala,
Andrea Pagnani,
Martin Weigt
Abstract:
We consider the problem of coloring Erdos-Renyi and regular random graphs of finite connectivity using q colors. It has been studied so far using the cavity approach within the so-called one-step replica symmetry breaking (1RSB) ansatz. We derive a general criterion for the validity of this ansatz and, applying it to the ground state, we provide evidence that the 1RSB solution gives exact thresh…
▽ More
We consider the problem of coloring Erdos-Renyi and regular random graphs of finite connectivity using q colors. It has been studied so far using the cavity approach within the so-called one-step replica symmetry breaking (1RSB) ansatz. We derive a general criterion for the validity of this ansatz and, applying it to the ground state, we provide evidence that the 1RSB solution gives exact threshold values c_q for the q-COL/UNCOL phase transition. We also study the asymptotic thresholds for q >> 1 finding c_q = 2qlog(q)-log(q)-1+o(1) in perfect agreement with rigorous mathematical bounds, as well as the nature of excited states, and give a global phase diagram of the problem.
△ Less
Submitted 28 July, 2004; v1 submitted 30 March, 2004;
originally announced March 2004.
-
Solving satisfiability problems by fluctuations: The dynamics of stochastic local search algorithms
Authors:
Wolfgang Barthel,
Alexander K. Hartmann,
Martin Weigt
Abstract:
Stochastic local search algorithms are frequently used to numerically solve hard combinatorial optimization or decision problems. We give numerical and approximate analytical descriptions of the dynamics of such algorithms applied to random satisfiability problems. We find two different dynamical regimes, depending on the number of constraints per variable: For low constraintness, the problems a…
▽ More
Stochastic local search algorithms are frequently used to numerically solve hard combinatorial optimization or decision problems. We give numerical and approximate analytical descriptions of the dynamics of such algorithms applied to random satisfiability problems. We find two different dynamical regimes, depending on the number of constraints per variable: For low constraintness, the problems are solved efficiently, i.e. in linear time. For higher constraintness, the solution times become exponential. We observe that the dynamical behavior is characterized by a fast equilibration and fluctuations around this equilibrium. If the algorithm runs long enough, an exponentially rare fluctuation towards a solution appears.
△ Less
Submitted 7 May, 2003; v1 submitted 15 January, 2003;
originally announced January 2003.
-
Constraint Satisfaction by Survey Propagation
Authors:
A. Braunstein,
M. Mezard,
M. Weigt,
R. Zecchina
Abstract:
Survey Propagation is an algorithm designed for solving typical instances of random constraint satisfiability problems. It has been successfully tested on random 3-SAT and random $G(n,\frac{c}{n})$ graph 3-coloring, in the hard region of the parameter space. Here we provide a generic formalism which applies to a wide class of discrete Constraint Satisfaction Problems.
Survey Propagation is an algorithm designed for solving typical instances of random constraint satisfiability problems. It has been successfully tested on random 3-SAT and random $G(n,\frac{c}{n})$ graph 3-coloring, in the hard region of the parameter space. Here we provide a generic formalism which applies to a wide class of discrete Constraint Satisfaction Problems.
△ Less
Submitted 27 September, 2003; v1 submitted 18 December, 2002;
originally announced December 2002.
-
Coloring random graphs
Authors:
R. Mulet,
A. Pagnani,
M. Weigt,
R. Zecchina
Abstract:
We study the graph coloring problem over random graphs of finite average connectivity $c$. Given a number $q$ of available colors, we find that graphs with low connectivity admit almost always a proper coloring whereas graphs with high connectivity are uncolorable. Depending on $q$, we find the precise value of the critical average connectivity $c_q$. Moreover, we show that below $c_q$ there exi…
▽ More
We study the graph coloring problem over random graphs of finite average connectivity $c$. Given a number $q$ of available colors, we find that graphs with low connectivity admit almost always a proper coloring whereas graphs with high connectivity are uncolorable. Depending on $q$, we find the precise value of the critical average connectivity $c_q$. Moreover, we show that below $c_q$ there exist a clustering phase $c\in [c_d,c_q]$ in which ground states spontaneously divide into an exponential number of clusters and where the proliferation of metastable states is responsible for the onset of complexity in local search algorithms.
△ Less
Submitted 28 October, 2002; v1 submitted 23 August, 2002;
originally announced August 2002.
-
Computational complexity arising from degree correlations in networks
Authors:
Alexei Vazquez,
Martin Weigt
Abstract:
We apply a Bethe-Peierls approach to statistical-mechanics models defined on random networks of arbitrary degree distribution and arbitrary correlations between the degrees of neighboring vertices. Using the NP-hard optimization problem of finding minimal vertex covers on these graphs, we show that such correlations may lead to a qualitatively different solution structure as compared to uncorrel…
▽ More
We apply a Bethe-Peierls approach to statistical-mechanics models defined on random networks of arbitrary degree distribution and arbitrary correlations between the degrees of neighboring vertices. Using the NP-hard optimization problem of finding minimal vertex covers on these graphs, we show that such correlations may lead to a qualitatively different solution structure as compared to uncorrelated networks. This results in a higher complexity of the network in a computational sense: Simple heuristic algorithms fail to find a minimal vertex cover in the highly correlated case, whereas uncorrelated networks seem to be simple from the point of view of combinatorial optimization.
△ Less
Submitted 16 November, 2002; v1 submitted 1 July, 2002;
originally announced July 2002.
-
Hiding solutions in random satisfiability problems: A statistical mechanics approach
Authors:
W. Barthel,
A. K. Hartmann,
M. Leone,
F. Ricci-Tersenghi,
M. Weigt,
R. Zecchina
Abstract:
A major problem in evaluating stochastic local search algorithms for NP-complete problems is the need for a systematic generation of hard test instances having previously known properties of the optimal solutions. On the basis of statistical mechanics results, we propose random generators of hard and satisfiable instances for the 3-satisfiability problem (3SAT). The design of the hardest problem…
▽ More
A major problem in evaluating stochastic local search algorithms for NP-complete problems is the need for a systematic generation of hard test instances having previously known properties of the optimal solutions. On the basis of statistical mechanics results, we propose random generators of hard and satisfiable instances for the 3-satisfiability problem (3SAT). The design of the hardest problem instances is based on the existence of a first order ferromagnetic phase transition and the glassy nature of excited states. The analytical predictions are corroborated by numerical results obtained from complete as well as stochastic local algorithms.
△ Less
Submitted 27 March, 2002; v1 submitted 9 November, 2001;
originally announced November 2001.
-
Simplest random K-satisfiability problem
Authors:
F. Ricci-Tersenghi,
M. Weigt,
R. Zecchina
Abstract:
We study a simple and exactly solvable model for the generation of random satisfiability problems. These consist of $γN$ random boolean constraints which are to be satisfied simultaneously by $N$ logical variables. In statistical-mechanics language, the considered model can be seen as a diluted p-spin model at zero temperature. While such problems become extraordinarily hard to solve by local se…
▽ More
We study a simple and exactly solvable model for the generation of random satisfiability problems. These consist of $γN$ random boolean constraints which are to be satisfied simultaneously by $N$ logical variables. In statistical-mechanics language, the considered model can be seen as a diluted p-spin model at zero temperature. While such problems become extraordinarily hard to solve by local search methods in a large region of the parameter space, still at least one solution may be superimposed by construction. The statistical properties of the model can be studied exactly by the replica method and each single instance can be analyzed in polynomial time by a simple global solution method. The geometrical/topological structures responsible for dynamic and static phase transitions as well as for the onset of computational complexity in local search method are thoroughly analyzed. Numerical analysis on very large samples allows for a precise characterization of the critical scaling behaviour.
△ Less
Submitted 21 December, 2000; v1 submitted 10 November, 2000;
originally announced November 2000.
-
Typical solution time for a vertex-covering algorithm on finite-connectivity random graphs
Authors:
Martin Weigt,
Alexander K. Hartmann
Abstract:
In this letter, we analytically describe the typical solution time needed by a backtracking algorithm to solve the vertex-cover problem on finite-connectivity random graphs. We find two different transitions: The first one is algorithm-dependent and marks the dynamical transition from linear to exponential solution times. The second one gives the maximum computational complexity, and is found ex…
▽ More
In this letter, we analytically describe the typical solution time needed by a backtracking algorithm to solve the vertex-cover problem on finite-connectivity random graphs. We find two different transitions: The first one is algorithm-dependent and marks the dynamical transition from linear to exponential solution times. The second one gives the maximum computational complexity, and is found exactly at the threshold where the system undergoes an algorithm-independent phase transition in its solvability. Analytical results are corroborated by numerical simulations.
△ Less
Submitted 28 November, 2000; v1 submitted 27 September, 2000;
originally announced September 2000.
-
Statistical mechanics perspective on the phase transition in vertex covering finite-connectivity random graphs
Authors:
Alexander K. Hartmann,
Martin Weigt
Abstract:
The vertex-cover problem is studied for random graphs $G_{N,cN}$ having $N$ vertices and $cN$ edges. Exact numerical results are obtained by a branch-and-bound algorithm. It is found that a transition in the coverability at a $c$-dependent threshold $x=x_c(c)$ appears, where $xN$ is the cardinality of the vertex cover. This transition coincides with a sharp peak of the typical numerical effort,…
▽ More
The vertex-cover problem is studied for random graphs $G_{N,cN}$ having $N$ vertices and $cN$ edges. Exact numerical results are obtained by a branch-and-bound algorithm. It is found that a transition in the coverability at a $c$-dependent threshold $x=x_c(c)$ appears, where $xN$ is the cardinality of the vertex cover. This transition coincides with a sharp peak of the typical numerical effort, which is needed to decide whether there exists a cover with $xN$ vertices or not. For small edge concentrations $c\ll 0.5$, a cluster expansion is performed, giving very accurate results in this regime. These results are extended using methods developed in statistical physics. The so called annealed approximation reproduces a rigorous bound on $x_c(c)$ which was known previously. The main part of the paper contains an application of the replica method. Within the replica symmetric ansatz the threshold $x_c(c)$ and the critical backbone size $b_c(c)$ can be calculated. For $c<e/2$ the results show an excellent agreement with the numerical findings. At average vertex degree $2c=e$, an instability of the simple replica symmetric solution occurs.
△ Less
Submitted 21 June, 2000;
originally announced June 2000.
-
The number of guards needed by a museum: A phase transition in vertex covering of random graphs
Authors:
Martin Weigt,
Alexander K. Hartmann
Abstract:
In this letter we study the NP-complete vertex cover problem on finite connectivity random graphs. When the allowed size of the cover set is decreased, a discontinuous transition in solvability and typical-case complexity occurs. This transition is characterized by means of exact numerical simulations as well as by analytical replica calculations. The replica symmetric phase diagram is in excell…
▽ More
In this letter we study the NP-complete vertex cover problem on finite connectivity random graphs. When the allowed size of the cover set is decreased, a discontinuous transition in solvability and typical-case complexity occurs. This transition is characterized by means of exact numerical simulations as well as by analytical replica calculations. The replica symmetric phase diagram is in excellent agreement with numerical findings up to average connectivity $e$, where replica symmetry becomes locally unstable.
△ Less
Submitted 3 May, 2000; v1 submitted 11 January, 2000;
originally announced January 2000.
-
A variational description of the ground state structure in random satisfiability problems
Authors:
Giulio Biroli,
Remi Monasson,
Martin Weigt
Abstract:
A variational approach to finite connectivity spin-glass-like models is developed and applied to describe the structure of optimal solutions in random satisfiability problems. Our variational scheme accurately reproduces the known replica symmetric results and also allows for the inclusion of replica symmetry breaking effects. For the 3-SAT problem, we find two transitions as the ratio $α$ of lo…
▽ More
A variational approach to finite connectivity spin-glass-like models is developed and applied to describe the structure of optimal solutions in random satisfiability problems. Our variational scheme accurately reproduces the known replica symmetric results and also allows for the inclusion of replica symmetry breaking effects. For the 3-SAT problem, we find two transitions as the ratio $α$ of logical clauses per Boolean variables increases. At the first one $α_s \simeq 3.96$, a non-trivial organization of the solution space in geometrically separated clusters emerges. The multiplicity of these clusters as well as the typical distances between different solutions are calculated. At the second threshold $α_c \simeq 4.48$, satisfying assignments disappear and a finite fraction $B_0 \simeq 0.13$ of variables are overconstrained and take the same values in all optimal (though unsatisfying) assignments. These values have to be compared to $α_c \simeq 4.27, B_0 \simeq 0.4$ obtained from numerical experiments on small instances. Within the present variational approach, the SAT-UNSAT transition naturally appears as a mixture of a first and a second order transition. For the mixed $2+p$-SAT with $p<2/5$, the behavior is as expected much simpler: a unique smooth transition from SAT to UNSAT takes place at $α_c=1/(1-p)$.
△ Less
Submitted 15 November, 1999; v1 submitted 22 July, 1999;
originally announced July 1999.