In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural networks. Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. This translates to an implicit regularization towards locality for the associated convolutional networks. Inspired by our theory, we design explicit regularization discouraging locality, and demonstrate its ability to improve the performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.
翻译:为了在深层学习中解释隐含的正规化,重点突出的是矩阵和强因子化,这与简化的神经网络相对应,表明这些模型显示了一种隐含的低矩阵和强级趋势,在实际的深层次学习方面,本文件从理论上分析了等级的强力因子化的隐含的正规化,这是一种与某些深层进化神经网络相当的模式。通过动态系统透镜,我们克服了与等级相关的挑战,并建立了低等级的隐含的正规化。这转化为相关革命网络向地方的隐含的正规化。在我们的理论的启发下,我们设计了明确的正规化,阻止了地方,并展示了它的能力来改进现代革命网络在非本地任务上的绩效,而不顾需要建筑变革的传统智慧。我们的工作凸显了通过理论分析其隐含的正规化而加强神经网络的潜力。