Our theoretical understanding of deep learning has not kept pace with its empirical success. While network architecture is known to be critical, we do not yet understand its effect on learned representations and network behavior, or how this architecture should reflect task structure.In this work, we begin to address this gap by introducing the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics within an architecture. Crucially, because of the gating, these networks can compute nonlinear functions of their input. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our analysis demonstrates that the learning dynamics in structured networks can be conceptualized as a neural race with an implicit bias towards shared representations, which then govern the model's ability to systematically generalize, multi-task, and transfer. We validate our key insights on naturalistic datasets and with relaxed assumptions. Taken together, our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures and the role of modularity and compositionality in solving real-world problems. The code and results are available at https://www.saxelab.org/gated-dln .
翻译:深层学习的理论理解没有跟上其经验成功的速度。 虽然网络架构已知至关重要,但我们还不能理解其对学习的表达方式和网络行为的影响,或这一架构应如何反映任务结构。 在这项工作中,我们开始通过引入Gated 深线网络框架来弥补这一差距,该框架对信息流动路径如何影响一个架构内的学习动态进行系统化分析。由于这一分析,这些网络可以计算出其投入的非线性功能。我们获得了精确的减少,并在某些案例中,对学习动态提供了精确的解决方案。我们的分析表明,结构化网络的学习动态可以被概念化为一种神经竞赛,以隐含的偏向于共享表达方式,从而指导模型系统化概括、多任务和转移的能力。我们验证了我们对自然学数据集和宽松假设的关键洞察力。加在一起,我们的工作产生了与学习的神经结构相关的一般假设,并为理解更复杂的结构的设计以及模块化和构成在解决现实世界问题中的作用提供了数学方法。我们的数据和结果可在 http://wwwsasasas.