In order to elucidate the plateau phenomena caused by vanishing gradient, we herein analyse stability of stochastic gradient descent near degenerated subspaces in a multi-layer perceptron. In stochastic gradient descent for Fukumizu-Amari model, which is the minimal multi-layer perceptron showing non-trivial plateau phenomena, we show that (1) attracting regions exist in multiply degenerated subspaces, (2) a strong plateau phenomenon emerges as a noise-induced synchronisation, which is not observed in deterministic gradient descent, (3) an optimal fluctuation exists to minimise the escape time from the degenerated subspace. The noise-induced degeneration observed herein is expected to be found in a broad class of machine learning via neural networks.
翻译:为了阐明梯度消失造成的高原现象,我们在此分析多层梯度摄氏梯度下降的子空间附近多层梯度梯度下降的稳定性。Fukumizu-Amari模型是显示非三角高地现象的最小多层感应器,在这种模型中,我们显示:(1)吸引区域存在于倍增退化的子空间中;(2)强烈的高原现象是噪音引起的同步现象,在确定性梯度下降时没有观察到;(3)存在最佳的波动,以最大限度地减少退化的子空间的逃逸时间;这里所观察到的噪音引起的衰变,预计将在通过神经网络进行大量机器学习的类别中找到。