Large-scale deep neural networks consume expensive training costs, but the training results in less-interpretable weight matrices constructing the networks. Here, we propose a mode decomposition learning that can interpret the weight matrices as a hierarchy of latent modes. These modes are akin to patterns in physics studies of memory networks, but the least number of modes increases only logarithmically with the network width, and becomes even a constant when the width further grows. The mode decomposition learning not only saves a significant large amount of training costs, but also explains the network performance with the leading modes, displaying a striking piecewise power-law behavior. The modes specify a progressively compact latent space across the network hierarchy, making a more disentangled subspaces compared to standard training. Our mode decomposition learning is also studied in an analytic on-line learning setting, which reveals multi-stage of learning dynamics with a continuous specialization of hidden nodes. Therefore, the proposed mode decomposition learning points to a cheap and interpretable route towards the magical deep learning.
翻译:大型深心神经网络花费昂贵的培训费用, 但培训结果导致建设网络的重量矩阵解释得较少。 在这里, 我们提出一个模式分解学习模式, 可以将重量矩阵解释为潜伏模式的等级。 这些模式类似于记忆网络物理研究的模式, 但最小的模型数量只会随着网络宽度的对数增加, 并在宽度进一步增大时变成一个常数。 模式分解学习不仅可以节省大量的培训费用, 还可以用领先模式解释网络的性能, 展示出惊人的片断电法行为。 这些模式指定了网络结构中逐渐紧凑的潜伏空间, 使得与标准培训相比, 更加分解的子空间。 我们的模式分解学习模式也在在线学习环境中进行分析研究, 分析显示多阶段的学习动态, 并不断对隐藏节点进行专业化。 因此, 拟议的模式分解将学习点分为一种廉价和可解释的路径, 通往神奇的深层学习。</s>