Zum Hauptinhalt springen

Showing 1–18 of 18 results for author: Mannelli, S S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.01589  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG q-bio.NC

    Tilting the Odds at the Lottery: the Interplay of Overparameterisation and Curricula in Neural Networks

    Authors: Stefano Sarao Mannelli, Yaraslau Ivashinka, Andrew Saxe, Luca Saglietti

    Abstract: A wide range of empirical and theoretical works have shown that overparameterisation can amplify the performance of neural networks. According to the lottery ticket hypothesis, overparameterised networks have an increased chance of containing a sub-network that is well-initialised to solve the task at hand. A more parsimonious approach, inspired by animal learning, consists in guiding the learner… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2405.18296  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Bias in Motion: Theoretical Insights into the Dynamics of Bias in SGD Training

    Authors: Anchit Jain, Rozhin Nobahari, Aristide Baratin, Stefano Sarao Mannelli

    Abstract: Machine learning systems often acquire biases by leveraging undesired features in the data, impacting accuracy variably across different sub-populations. Current understanding of bias formation mostly focuses on the initial and final stages of learning, leaving a gap in knowledge regarding the transient dynamics. To address this gap, this paper explores the evolution of bias in a teacher-student s… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2306.10404  [pdf, other

    cs.LG cond-mat.dis-nn

    The RL Perceptron: Generalisation Dynamics of Policy Learning in High Dimensions

    Authors: Nishil Patel, Sebastian Lee, Stefano Sarao Mannelli, Sebastian Goldt, Andrew Saxe

    Abstract: Reinforcement learning (RL) algorithms have proven transformative in a range of domains. To tackle real-world domains, these systems often use neural networks to learn policies directly from pixels or other high-dimensional sensory input. By contrast, much theory of RL has focused on discrete state spaces or worst-case analysis, and fundamental questions remain about the dynamics of policy learnin… ▽ More

    Submitted 2 September, 2023; v1 submitted 17 June, 2023; originally announced June 2023.

    Comments: 10 pages, 7 figures, Preprint

  4. arXiv:2303.01429  [pdf, other

    cs.LG

    Optimal transfer protocol by incremental layer defrosting

    Authors: Federica Gerace, Diego Doimo, Stefano Sarao Mannelli, Luca Saglietti, Alessandro Laio

    Abstract: Transfer learning is a powerful tool enabling model training with limited amounts of data. This technique is particularly useful in real-world problems where data availability is often a serious limitation. The simplest transfer learning protocol is based on ``freezing" the feature-extractor layers of a network pre-trained on a data-rich source task, and then adapting only the last layers to a dat… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

  5. arXiv:2205.15935  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Bias-inducing geometries: an exactly solvable data model with fairness implications

    Authors: Stefano Sarao Mannelli, Federica Gerace, Negar Rostamzadeh, Luca Saglietti

    Abstract: Machine learning (ML) may be oblivious to human bias but it is not immune to its perpetuation. Marginalisation and iniquitous group representation are often traceable in the very data used for training, and may be reflected or even enhanced by the learning models. In the present work, we aim at clarifying the role played by data geometry in the emergence of ML bias. We introduce an exactly solvabl… ▽ More

    Submitted 13 November, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: 9 pages + methods + SI

  6. arXiv:2205.09029  [pdf, other

    stat.ML cs.LG

    Maslow's Hammer for Catastrophic Forgetting: Node Re-Use vs Node Activation

    Authors: Sebastian Lee, Stefano Sarao Mannelli, Claudia Clopath, Sebastian Goldt, Andrew Saxe

    Abstract: Continual learning - learning new tasks in sequence while maintaining performance on old tasks - remains particularly challenging for artificial neural networks. Surprisingly, the amount of forgetting does not increase with the dissimilarity between the learned tasks, but appears to be worst in an intermediate similarity regime. In this paper we theoretically analyse both a synthetic teacher-stu… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:12455-12477 (2022)

  7. arXiv:2106.08068  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    An Analytical Theory of Curriculum Learning in Teacher-Student Networks

    Authors: Luca Saglietti, Stefano Sarao Mannelli, Andrew Saxe

    Abstract: In humans and animals, curriculum learning -- presenting data in a curated order - is critical to rapid learning and effective pedagogy. Yet in machine learning, curricula are not widely used and empirically often yield only moderate benefits. This stark difference in the importance of curriculum raises a fundamental theoretical question: when and why does curriculum learning help? In this work,… ▽ More

    Submitted 12 October, 2022; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: Accepted to NeurIPS 2022

  8. arXiv:2106.05418  [pdf, other

    cs.LG cond-mat.dis-nn

    Probing transfer learning with a model of synthetic correlated datasets

    Authors: Federica Gerace, Luca Saglietti, Stefano Sarao Mannelli, Andrew Saxe, Lenka Zdeborová

    Abstract: Transfer learning can significantly improve the sample efficiency of neural networks, by exploiting the relatedness between a data-scarce target task and a data-abundant source task. Despite years of successful applications, transfer learning practice often relies on ad-hoc solutions, while theoretical understanding of these procedures is still limited. In the present work, we re-think a solvable… ▽ More

    Submitted 2 February, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

    Journal ref: Machine Learning: Science and Technology 3.1 (2022): 015030

  9. arXiv:2102.11755  [pdf, other

    cond-mat.dis-nn cs.LG stat.ML

    Analytical Study of Momentum-Based Acceleration Methods in Paradigmatic High-Dimensional Non-Convex Problems

    Authors: Stefano Sarao Mannelli, Pierfrancesco Urbani

    Abstract: The optimization step in many machine learning problems rarely relies on vanilla gradient descent but it is common practice to use momentum-based accelerated methods. Despite these algorithms being widely applied to arbitrary loss functions, their behaviour in generically non-convex, high dimensional landscapes is poorly understood. In this work, we use dynamical mean field theory techniques to de… ▽ More

    Submitted 27 October, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

    Comments: To appear in NeurIPS 2021

  10. arXiv:2009.09422  [pdf, other

    q-bio.PE cond-mat.stat-mech cs.AI cs.LG

    Epidemic mitigation by statistical inference from contact tracing data

    Authors: Antoine Baker, Indaco Biazzo, Alfredo Braunstein, Giovanni Catania, Luca Dall'Asta, Alessandro Ingrosso, Florent Krzakala, Fabio Mazza, Marc Mézard, Anna Paola Muntoni, Maria Refinetti, Stefano Sarao Mannelli, Lenka Zdeborová

    Abstract: Contact-tracing is an essential tool in order to mitigate the impact of pandemic such as the COVID-19. In order to achieve efficient and scalable contact-tracing in real time, digital devices can play an important role. While a lot of attention has been paid to analyzing the privacy and ethical risks of the associated mobile applications, so far much less research has been devoted to optimizing th… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

    Comments: 21 pages, 7 figures

    ACM Class: G.3; G.4; I.2.11; J.3

    Journal ref: PNAS 2021 Vol. 118 No. 32 e2106548118

  11. arXiv:2007.13483  [pdf, other

    cs.LG cs.AI

    Post-Workshop Report on Science meets Engineering in Deep Learning, NeurIPS 2019, Vancouver

    Authors: Levent Sagun, Caglar Gulcehre, Adriana Romero, Negar Rostamzadeh, Stefano Sarao Mannelli

    Abstract: Science meets Engineering in Deep Learning took place in Vancouver as part of the Workshop section of NeurIPS 2019. As organizers of the workshop, we created the following report in an attempt to isolate emerging topics and recurring themes that have been presented throughout the event. Deep learning can still be a complex mix of art and engineering despite its tremendous success in recent years.… ▽ More

    Submitted 29 July, 2020; v1 submitted 25 June, 2020; originally announced July 2020.

    Comments: Report of NeurIPS 2019 workshop SEDL

  12. arXiv:2006.15459  [pdf, other

    cs.LG stat.ML

    Optimization and Generalization of Shallow Neural Networks with Quadratic Activation Functions

    Authors: Stefano Sarao Mannelli, Eric Vanden-Eijnden, Lenka Zdeborová

    Abstract: We study the dynamics of optimization and the generalization properties of one-hidden layer neural networks with quadratic activation function in the over-parametrized regime where the layer width $m$ is larger than the input dimension $d$. We consider a teacher-student scenario where the teacher has the same structure as the student with a hidden layer of smaller width $m^*\le m$. We describe… ▽ More

    Submitted 18 August, 2020; v1 submitted 27 June, 2020; originally announced June 2020.

    Comments: 10 pages, 4 figures + appendix

    Journal ref: Advances in Neural Information Processing Systems, v33, page 13445--13455, 2020

  13. arXiv:2006.13395  [pdf, other

    cs.SI math.OC physics.soc-ph q-bio.PE

    Winning the competition: enhancing counter-contagion in SIS-like epidemic processes

    Authors: Argyris Kalogeratos, Stefano Sarao Mannelli

    Abstract: In this paper we consider the epidemic competition between two generic diffusion processes, where each competing side is represented by a different state of a stochastic process. For this setting, we present the Generalized Largest Reduction in Infectious Edges (gLRIE) dynamic resource allocation strategy to advantage the preferred state against the other. Motivated by social epidemics, we apply t… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

    Comments: 4 pages, 3 figures, linked to external 6-page Appendix

    ACM Class: G.3; I.6; I.2.8; J.4

  14. arXiv:2006.06997  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: Despite the widespread use of gradient-based algorithms for optimizing high-dimensional non-convex functions, understanding their ability of finding good minima instead of being trapped in spurious ones remains to a large extent an open problem. Here we focus on gradient flow dynamics for phase retrieval from random measurements. When the ratio of the number of measurements over the input dimensio… ▽ More

    Submitted 12 June, 2020; originally announced June 2020.

    Comments: 9 pages, 5 figures + appendix

    Journal ref: Advances in Neural Information Processing Systems, v22, page 3265--327, 2020

  15. arXiv:2001.00479  [pdf, other

    cs.LG cond-mat.dis-nn stat.ML

    Thresholds of descending algorithms in inference problems

    Authors: Stefano Sarao Mannelli, Lenka Zdeborova

    Abstract: We review recent works on analyzing the dynamics of gradient-based algorithms in a prototypical statistical inference problem. Using methods and insights from the physics of glassy systems, these works showed how to understand quantitatively and qualitatively the performance of gradient-based algorithms. Here we review the key results and their interpretation in non-technical terms accessible to a… ▽ More

    Submitted 4 January, 2020; v1 submitted 2 January, 2020; originally announced January 2020.

    Comments: 8 pages, 4 figures

    Journal ref: J. Stat. Mech. (2020) 034004

  16. arXiv:1907.08226  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Who is Afraid of Big Bad Minima? Analysis of Gradient-Flow in a Spiked Matrix-Tensor Model

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Lenka Zdeborová

    Abstract: Gradient-based algorithms are effective for many machine learning tasks, but despite ample recent effort and some progress, it often remains unclear why they work in practice in optimising high-dimensional non-convex functions and why they find good minima instead of being trapped in spurious ones. Here we present a quantitative theory explaining this behaviour in a spiked matrix-tensor model.… ▽ More

    Submitted 20 January, 2020; v1 submitted 18 July, 2019; originally announced July 2019.

    Comments: 9 pages, 4 figures + appendix. Appears in Proceedings of the Advances in Neural Information Processing Systems 2019 (NeurIPS 2019)

    Journal ref: Advances in Neural Information Processing Systems, pp. 8676-8686. 2019

  17. arXiv:1902.00139  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Passed & Spurious: Descent Algorithms and Local Minima in Spiked Matrix-Tensor Models

    Authors: Stefano Sarao Mannelli, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: In this work we analyse quantitatively the interplay between the loss landscape and performance of descent algorithms in a prototypical inference problem, the spiked matrix-tensor model. We study a loss function that is the negative log-likelihood of the model. We analyse the number of local minima at a fixed distance from the signal/spike with the Kac-Rice formula, and locate trivialization of th… ▽ More

    Submitted 20 January, 2020; v1 submitted 31 January, 2019; originally announced February 2019.

    Comments: 12 pages + appendix, 10 figures. Appears in Proceedings of the International Conference on Machine Learning (ICML 2019)

    Journal ref: International Conference on Machine Learning, 4333-4342 (ICML 2019)

  18. arXiv:1812.09066  [pdf, other

    cs.LG cond-mat.dis-nn math.ST stat.ML

    Marvels and Pitfalls of the Langevin Algorithm in Noisy High-dimensional Inference

    Authors: Stefano Sarao Mannelli, Giulio Biroli, Chiara Cammarota, Florent Krzakala, Pierfrancesco Urbani, Lenka Zdeborová

    Abstract: Gradient-descent-based algorithms and their stochastic versions have widespread applications in machine learning and statistical inference. In this work we perform an analytic study of the performances of one of them, the Langevin algorithm, in the context of noisy high-dimensional inference. We employ the Langevin algorithm to sample the posterior probability measure for the spiked matrix-tensor… ▽ More

    Submitted 13 January, 2020; v1 submitted 21 December, 2018; originally announced December 2018.

    Comments: 11 pages and 5 figures + appendix

    Journal ref: Phys. Rev. X 10, 011057 (2020)