Various powerful deep neural network architectures have made great contribution to the exciting successes of deep learning in the past two decades. Among them, deep Residual Networks (ResNets) are of particular importance because they demonstrated great usefulness in computer vision by winning the first place in many deep learning competitions. Also, ResNets were the first class of neural networks in the development history of deep learning that are really deep. It is of mathematical interest and practical meaning to understand the convergence of deep ResNets. We aim at characterizing the convergence of deep ResNets as the depth tends to infinity in terms of the parameters of the networks. Toward this purpose, we first give a matrix-vector description of general deep neural networks with shortcut connections and formulate an explicit expression for the networks by using the notions of activation domains and activation matrices. The convergence is then reduced to the convergence of two series involving infinite products of non-square matrices. By studying the two series, we establish a sufficient condition for pointwise convergence of ResNets. Our result is able to give justification for the design of ResNets. We also conduct experiments on benchmark machine learning data to verify our results.
翻译:过去二十年来,各种强大的深层神经网络结构为深层学习的令人振奋的成功作出了巨大贡献,其中深层残余网络(ResNets)具有特别重要的意义,因为它们在许多深层学习竞赛中赢得第一位,从而在计算机视野中表现出巨大的用处。此外,ResNets是深层学习史上最先出现的神经网络。理解深层ResNets的趋同具有数学意义和实际意义。我们的目标是将深层ResNets的趋同特征化为网络参数的深度趋于无限化。为此目的,我们首先对带有快捷连接的一般深层神经网络进行矩阵-摄像头描述,并通过使用激活域和激活矩阵的概念为网络作出明确的表达。然后,这种趋同变成两个系列的趋同,其中涉及无限的非平面矩阵产品。通过研究这两个系列,我们为深层ResNets的趋同建立了充分的条件。我们的结果可以为ResNets的设计提供理由。我们还在基准机器学习数据上进行实验,以核实我们的成果。