We investigate numerous structural connections between numerical algorithms for partial differential equations (PDEs) and neural architectures. Our goal is to transfer the rich set of mathematical foundations from the world of PDEs to neural networks. Besides structural insights we provide concrete examples and experimental evaluations of the resulting architectures. Using the example of generalised nonlinear diffusion in 1D, we consider explicit schemes, acceleration strategies thereof, implicit schemes, and multigrid approaches. We connect these concepts to residual networks, recurrent neural networks, and U-net architectures. Our findings inspire a symmetric residual network design with provable stability guarantees and justify the effectiveness of skip connections in neural networks from a numerical perspective. Moreover, we present U-net architectures that implement multigrid techniques for learning efficient solutions of partial differential equation models, and motivate uncommon design choices such as trainable nonmonotone activation functions. Experimental evaluations show that the proposed architectures save half of the trainable parameters and can thus outperform standard ones with the same model complexity. Our considerations serve as a basis for explaining the success of popular neural architectures and provide a blueprint for developing new mathematically well-founded neural building blocks.
翻译:我们调查了部分差异方程(PDEs)和神经结构之间的众多结构关系。我们的目标是将丰富的数学基础从PDEs世界转移到神经网络。除了结构性的洞察外,我们还提供了具体的例子和对由此形成的结构的实验性评估。我们以1D中非线性扩散为例,考虑明确的计划、加速战略、其中的加速战略、隐含计划和多电网方法。我们将这些概念与剩余网络、经常性神经网络和U-net结构联系起来。我们的调查结果启发了一种对称剩余网络设计,具有可变的稳定保证,并且从数字角度证明神经网络中跳过连接的有效性。此外,我们提出了应用多电网结构,以学习部分差异方程模型的有效解决方案,并激励非典型的设计选择,如可培训的非线性激活功能。实验性评估表明,拟议的结构保存了可训练参数的一半,从而可以超越模型复杂性的标准。我们的考虑是解释流行的神经结构成功的基础,并为开发新的数学基础神经构造提供了蓝图。