• Editors' Suggestion
  • Open Access

Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models

Jason W. Rocks and Pankaj Mehta
Phys. Rev. Research 4, 013201 – Published 15 March 2022
PDFHTMLExport Citation

Abstract

The bias-variance trade-off is a central concept in supervised learning. In classical statistics, increasing the complexity of a model (e.g., number of parameters) reduces bias but also increases variance. Until recently, it was commonly believed that optimal performance is achieved at intermediate model complexities which strike a balance between bias and variance. Modern deep learning methods flout this dogma, achieving state-of-the-art performance using “overparameterized models” where the number of fit parameters is large enough to perfectly fit the training data. As a result, understanding bias and variance in overparameterized models has emerged as a fundamental problem in machine learning. Here, we use methods from statistical physics to derive analytic expressions for bias and variance in two minimal models of overparameterization (linear regression and two-layer neural networks with nonlinear data distributions), allowing us to disentangle properties stemming from the model architecture and random sampling of data. In both models, increasing the number of fit parameters leads to a phase transition where the training error goes to zero and the test error diverges as a result of the variance (while the bias remains finite). Beyond this threshold, the test error of the two-layer neural network decreases due to a monotonic decrease in both the bias and variance as opposed to the classical bias-variance trade-off. We also show that in contrast with classical intuition, overparameterized models can overfit even in the absence of noise and exhibit bias even if the student and teacher models match. We synthesize these results to construct a holistic understanding of generalization error and the bias-variance trade-off in overparameterized models and relate our results to random matrix theory.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Received 29 October 2020
  • Revised 9 June 2021
  • Accepted 22 February 2022

DOI:https://doi.org/10.1103/PhysRevResearch.4.013201

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Statistical Physics & Thermodynamics

Authors & Affiliations

Jason W. Rocks1 and Pankaj Mehta1,2,*

  • 1Department of Physics, Boston University, Boston, Massachusetts 02215, USA
  • 2Faculty of Computing and Data Sciences, Boston University, Boston, Massachusetts 02215, USA

Article Text

Click to Expand

Supplemental Material

Click to Expand

References

Click to Expand
Issue

Vol. 4, Iss. 1 — March - May 2022

Subject Areas
Reuse & Permissions
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review Research

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 4.0 International license. This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

×

Log In

Abbrechen
×

Suche


Article Lookup

Paste a citation or DOI

Enter a citation
×