-
Random features and polynomial rules
Authors:
Fabián Aguirre-López,
Silvio Franz,
Mauro Pastore
Abstract:
Random features models play a distinguished role in the theory of deep learning, describing the behavior of neural networks close to their infinite-width limit. In this work, we present a thorough analysis of the generalization performance of random features models for generic supervised learning problems with Gaussian data. Our approach, built with tools from the statistical mechanics of disorder…
▽ More
Random features models play a distinguished role in the theory of deep learning, describing the behavior of neural networks close to their infinite-width limit. In this work, we present a thorough analysis of the generalization performance of random features models for generic supervised learning problems with Gaussian data. Our approach, built with tools from the statistical mechanics of disordered systems, maps the random features model to an equivalent polynomial model, and allows us to plot average generalization curves as functions of the two main control parameters of the problem: the number of random features $N$ and the size $P$ of the training set, both assumed to scale as powers in the input dimension $D$. Our results extend the case of proportional scaling between $N$, $P$ and $D$. They are in accordance with rigorous bounds known for certain particular learning tasks and are in quantitative agreement with numerical experiments performed over many order of magnitudes of $N$ and $P$. We find good agreement also far from the asymptotic limits where $D\to \infty$ and at least one between $P/D^K$, $N/D^L$ remains finite.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Glassy nature of the hard phase in inference problems
Authors:
Fabrizio Antenucci,
Silvio Franz,
Pierfrancesco Urbani,
Lenka Zdeborová
Abstract:
An algorithmically hard phase was described in a range of inference problems: even if the signal can be reconstructed with a small error from an information theoretic point of view, known algorithms fail unless the noise-to-signal ratio is sufficiently small. This hard phase is typically understood as a metastable branch of the dynamical evolution of message passing algorithms. In this work we stu…
▽ More
An algorithmically hard phase was described in a range of inference problems: even if the signal can be reconstructed with a small error from an information theoretic point of view, known algorithms fail unless the noise-to-signal ratio is sufficiently small. This hard phase is typically understood as a metastable branch of the dynamical evolution of message passing algorithms. In this work we study the metastable branch for a prototypical inference problem, the low-rank matrix factorization, that presents a hard phase. We show that for noise-to-signal ratios that are below the information theoretic threshold, the posterior measure is composed of an exponential number of metastable glassy states and we compute their entropy, called the complexity. We show that this glassiness extends even slightly below the algorithmic threshold below which the well-known approximate message passing (AMP) algorithm is able to closely reconstruct the signal. Counter-intuitively, we find that the performance of the AMP algorithm is not improved by taking into account the glassy nature of the hard phase. This result provides further evidence that the hard phase in inference problems is algorithmically impenetrable for some deep computational reasons that remain to be uncovered.
△ Less
Submitted 9 January, 2019; v1 submitted 15 May, 2018;
originally announced May 2018.
-
The edge-disjoint path problem on random graphs by message-passing
Authors:
Fabrizio Altarelli,
Alfredo Braunstein,
Luca Dall'Asta,
Caterina De Bacco,
Silvio Franz
Abstract:
We present a message-passing algorithm to solve the edge disjoint path problem (EDP) on graphs incorporating under a unique framework both traffic optimization and path length minimization. The min-sum equations for this problem present an exponential computational cost in the number of paths. To overcome this obstacle we propose an efficient implementation by mapping the equations onto a weighted…
▽ More
We present a message-passing algorithm to solve the edge disjoint path problem (EDP) on graphs incorporating under a unique framework both traffic optimization and path length minimization. The min-sum equations for this problem present an exponential computational cost in the number of paths. To overcome this obstacle we propose an efficient implementation by mapping the equations onto a weighted combinatorial matching problem over an auxiliary graph. We perform extensive numerical simulations on random graphs of various types to test the performance both in terms of path length minimization and maximization of the number of accommodated paths. In addition, we test the performance on benchmark instances on various graphs by comparison with state-of-the-art algorithms and results found in the literature. Our message-passing algorithm always outperforms the others in terms of the number of accommodated paths when considering non trivial instances (otherwise it gives the same trivial results). Remarkably, the largest improvement in performance with respect to the other methods employed is found in the case of benchmarks with meshes, where the validity hypothesis behind message-passing is expected to worsen. In these cases, even though the exact message-passing equations do not converge, by introducing a reinforcement parameter to force convergence towards a sub optimal solution, we were able to always outperform the other algorithms with a peak of 27% performance improvement in terms of accommodated paths. On random graphs, we numerically observe two separated regimes: one in which all paths can be accommodated and one in which this is not possible. We also investigate the behaviour of both the number of paths to be accommodated and their minimum total length.
△ Less
Submitted 2 March, 2015;
originally announced March 2015.
-
Shortest node-disjoint paths on random graphs
Authors:
Caterina De Bacco,
Silvio Franz,
David Saad,
Chi Ho Yeung
Abstract:
A localized method to distribute paths on random graphs is devised, aimed at finding the shortest paths between given source/destination pairs while avoiding path overlaps at nodes. We propose a method based on message-passing techniques to process global information and distribute paths optimally. Statistical properties such as scaling with system size and number of paths, average path-length and…
▽ More
A localized method to distribute paths on random graphs is devised, aimed at finding the shortest paths between given source/destination pairs while avoiding path overlaps at nodes. We propose a method based on message-passing techniques to process global information and distribute paths optimally. Statistical properties such as scaling with system size and number of paths, average path-length and the transition to the frustrated regime are analysed. The performance of the suggested algorithm is evaluated through a comparison against a greedy algorithm.
△ Less
Submitted 18 May, 2014; v1 submitted 31 January, 2014;
originally announced January 2014.
-
Dynamics and termination cost of spatially coupled mean-field models
Authors:
Francesco Caltagirone,
Silvio Franz,
Richard Morris,
Lenka Zdeborová
Abstract:
This work is motivated by recent progress in information theory and signal processing where the so-called `spatially coupled' design of systems leads to considerably better performance. We address relevant open questions about spatially coupled systems through the study of a simple Ising model. In particular, we consider a chain of Curie-Weiss models that are coupled by interactions up to a certai…
▽ More
This work is motivated by recent progress in information theory and signal processing where the so-called `spatially coupled' design of systems leads to considerably better performance. We address relevant open questions about spatially coupled systems through the study of a simple Ising model. In particular, we consider a chain of Curie-Weiss models that are coupled by interactions up to a certain range. Indeed, it is well known that the pure (uncoupled) Curie-Weiss model undergoes a first order phase transition driven by the magnetic field, and furthermore, in the spinodal region such systems are unable to reach equilibrium in sub-exponential time if initialized in the metastable state. By contrast, the spatially coupled system is, instead, able to reach the equilibrium even when initialized to the metastable state. The equilibrium phase propagates along the chain in the form of a travelling wave. Here we study the speed of the wave-front and the so-called `termination cost'--- \textit{i.e.}, the conditions necessary for the propagation to occur. We reach several interesting conclusions about optimization of the speed and the cost.
△ Less
Submitted 8 October, 2013;
originally announced October 2013.
-
Exact solutions for diluted spin glasses and optimization problems
Authors:
S. Franz,
M. Leone,
F. Ricci-Tersenghi,
R. Zecchina
Abstract:
We study the low temperature properties of p-spin glass models with finite connectivity and of some optimization problems. Using a one-step functional replica symmetry breaking Ansatz we can solve exactly the saddle-point equations for graphs with uniform connectivity. The resulting ground state energy is in perfect agreement with numerical simulations. For fluctuating connectivity graphs, the s…
▽ More
We study the low temperature properties of p-spin glass models with finite connectivity and of some optimization problems. Using a one-step functional replica symmetry breaking Ansatz we can solve exactly the saddle-point equations for graphs with uniform connectivity. The resulting ground state energy is in perfect agreement with numerical simulations. For fluctuating connectivity graphs, the same Ansatz can be used in a variational way: For p-spin models (known as p-XOR-SAT in computer science) it provides the exact configurational entropy together with the dynamical and static critical connectivities (for p=3, γ_d=0.818 and γ_s=0.918 resp.), whereas for hard optimization problems like 3-SAT or Bicoloring it provides new upper bounds for their critical thresholds (γ_c^{var}=4.396 and γ_c^{var}=2.149 resp.).
△ Less
Submitted 15 August, 2001; v1 submitted 15 March, 2001;
originally announced March 2001.