To understand learning the dynamics of deep ReLU networks, we investigate the dynamic system of gradient flow $w(t)$ by decomposing it to magnitude $w(t)$ and angle $\phi(t):= \pi - \theta(t) $ components. In particular, for multi-layer single ReLU neurons with spherically symmetric data distribution and the square loss function, we provide upper and lower bounds for magnitude and angle components to describe the dynamics of gradient flow. Using the obtained bounds, we conclude that small scale initialization induces slow convergence speed for deep single ReLU neurons. Finally, by exploiting the relation of gradient flow and gradient descent, we extend our results to the gradient descent approach. All theoretical results are verified by experiments.
翻译:为了了解深RELU网络的动态,我们通过将梯度流的动态系统分解成 $w(t) 美元和角 $(t): =\pi -\theta(t) $(t) 元元件。特别是,对于具有球状对称数据分布和平方损函数的多层单一RELU神经人,我们提供大小和下限的大小和角度部分,以描述梯度流的动态。使用所获得的界限,我们的结论是,小规模初始化会给深的单一RELU神经人带来缓慢的趋同速度。最后,通过利用梯度流和梯度下降的关系,我们将我们的结果扩大到梯度下降法。所有理论结果都通过实验得到验证。