Magnitude and angle dynamics in training single ReLU neurons

Sangmin Lee; Byeongsu Sim; Jong Chul Ye

doi:10.1016/j.neunet.2024.106435

Magnitude and angle dynamics in training single ReLU neurons

Neural Netw. 2024 Oct:178:106435. doi: 10.1016/j.neunet.2024.106435. Epub 2024 Jun 22.

Authors

Sangmin Lee¹, Byeongsu Sim¹, Jong Chul Ye²

Affiliations

¹ Department of Mathematical Sciences, KAIST, Daejeon, Republic of Korea.
² Kim Jaechul Graduate School of AI, KAIST, Daejeon, Republic of Korea. Electronic address: [email protected].

PMID: 38970945
DOI: 10.1016/j.neunet.2024.106435

Abstract

Understanding the training dynamics of deep ReLU networks is a significant area of interest in deep learning. However, there remains a lack of complete elucidation regarding the weight vector dynamics, even for single ReLU neurons. To bridge this gap, our study delves into the training dynamics of the gradient flow w(t) for single ReLU neurons under the square loss, dissecting it into its magnitude ‖w(t)‖ and angle φ(t) components. Through this decomposition, we establish upper and lower bounds on these components to elucidate the convergence dynamics. Furthermore, we demonstrate the empirical extension of our findings to general two-layer multi-neuron networks. All theoretical results are generalized to the gradient descent method and rigorously verified through experiments.

Keywords: Gradient flow; Magnitude and angle dynamics; Single ReLU neurons.

MeSH terms

Algorithms
Deep Learning
Models, Neurological
Neural Networks, Computer*
Neurons* / physiology