Draft:Grokking (machine learning): Difference between revisions

Revision as of 11:36, 4 June 2024

In machine learning, grokking is a neologism which describes a transition to generalization that occurs many training iterations after the interpolation threshold, after many iterations of seemingly little progress.^[1]^[2]

The term derives from the word grok coined by Robert Heinlein in his novel Stranger in a Strange Land.

Grokking can be understood as a phase transition during the training process.^[3] While grokking has been thought of as largely a phenomenon of relatively shallow models, grokking has been observed in deep models and is the subject of active research.^[4]

References

^ Pearce, Adam; Ghandeharioun, Asma; Hussein, Nada; Thain, Nithum; Wattenberg, Martin; August 2023, Lucas Dixon. "Do Machine Learning Models Memorize or Generalize?". pair.withgoogle.com. Retrieved 2024-06-04.{{cite web}}: CS1 maint: numeric names: authors list (link)
^ Power, Alethea; Burda, Yuri; Edwards, Harri; Babuschkin, Igor; Misra, Vedant (2022-01-06), Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets, arXiv:2201.02177, retrieved 2024-06-04
^ Liu, Ziming; Kitouni, Ouail; Nolte, Niklas; Michaud, Eric J.; Tegmark, Max; Williams, Mike (2022-10-14), Towards Understanding Grokking: An Effective Theory of Representation Learning, arXiv:2205.10343, retrieved 2024-06-04
^ Fan, Simin; Pascanu, Razvan; Jaggi, Martin (2024-05-29), Deep Grokking: Would Deep Neural Networks Generalize Better?, arXiv:2405.19454, retrieved 2024-06-04

wikidata:Q126362531

This artificial intelligence-related article is a stub. You can help Wikipedia by expanding it.

[1] Pearce, Adam; Ghandeharioun, Asma; Hussein, Nada; Thain, Nithum; Wattenberg, Martin; August 2023, Lucas Dixon. "Do Machine Learning Models Memorize or Generalize?". pair.withgoogle.com. Retrieved 2024-06-04.{{cite web}}: CS1 maint: numeric names: authors list (link)

[2] Power, Alethea; Burda, Yuri; Edwards, Harri; Babuschkin, Igor; Misra, Vedant (2022-01-06), Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets, arXiv:2201.02177, retrieved 2024-06-04

[3] Liu, Ziming; Kitouni, Ouail; Nolte, Niklas; Michaud, Eric J.; Tegmark, Max; Williams, Mike (2022-10-14), Towards Understanding Grokking: An Effective Theory of Representation Learning, arXiv:2205.10343, retrieved 2024-06-04

[4] Fan, Simin; Pascanu, Razvan; Jaggi, Martin (2024-05-29), Deep Grokking: Would Deep Neural Networks Generalize Better?, arXiv:2405.19454, retrieved 2024-06-04

[1]

[2]

[3]

[4]

Revision as of 11:36, 4 June 2024

See also

References