由深层学生机进行空间多样性学习 (Spatially heterogeneous learning by a deep student machine)

Despite the spectacular successes, deep neural networks (DNN) with a huge number of adjustable parameters remain largely black boxes. To shed light on the hidden layers of DNN, we study supervised learning by a DNN of width $N$ and depth $L$ consisting of perceptrons with $c$ inputs by a statistical mechanics approach called the teacher-student setting. We consider an ensemble of student machines that exactly reproduce $M$ sets of $N$ dimensional input/output relations provided by a teacher machine. We analyze the ensemble theoretically using a replica method (H. Yoshino (2020)) and numerically performing greedy Monte Carlo simulations. The replica theory which works on high dimensional data $N \gg 1$ becomes exact in 'dense limit' $N \gg c \gg 1$ and $M \gg 1$ with fixed $\alpha=M/c$. Both the theory and the simulation suggest learning by the DNN is quite heterogeneous in the network space: configurations of the machines are more correlated within the layers closer to the input/output boundaries while the central region remains much less correlated due to over-parametrization. Deep enough systems relax faster thanks to the less correlated central region. Remarkably both the theory and simulation suggest generalization-ability of the student machines does not vanish even in the deep limit $L \gg 1$ where the system becomes strongly over-parametrized. We also consider the impact of effective dimension $D(\leq N)$ of data by incorporating the hidden manifold model (S. Goldt et al (2020)) into our model. The replica theory implies that the loop corrections to the dense limit, which reflect correlations between different nodes in the network, become enhanced by either decreasing the width $\ N$ or decreasing the effective dimension $D$ of the data. Simulation suggests both leads to significant improvements in generalization-ability.

翻译：尽管取得了令人瞩目的成功,但具有大量可调整参数的深神经网络(DNN)仍然在很大程度上是黑盒子。为了揭示DNN隐藏层,我们研究由宽度为00美元、深度为1美元的DNNN监管学习,该DNN由称为师生设置的统计机理学方法构成,包含以美元为单位的接收器和投入为单位的深度为1美元。我们认为,一套学生机器的组合完全复制了由教师机器提供的一套以美元为单位的立体输入/输出关系。我们利用复制方法(H. Yoshino (202020))从理论上分析集合,并用数字方式进行贪婪的蒙特卡洛模拟。在高度数据模型数据上工作的复制理论 $NNN\gg 1美元是准确的。Smodealliveral drevelopal dislational dislational dismationality, 在普通数据中,Smiditional-lational-deal-deal lax the lax lax the ladeal demodeal remodeal demodeal demodeal demodeal dal deal deal demodeal demode, laut the mislation) 20 dislation 20 20) 和(从20n d d) 20 d) 也逐渐不甚甚甚甚甚甚甚甚甚甚低, 。我们更低的系统显示。