Over recent years, deep learning has become the mainstream data-driven approach to solve many important real-world problems. In the successful network architectures, shortcut connections are well established to take the outputs of earlier layers as additional inputs to later layers, which have produced excellent results. Despite the extraordinary effectiveness of shortcuts, there remain important questions on the underlying mechanism and associated functionalities. For example, why are shortcuts powerful? Why shortcuts generalize well? To address these questions, we investigate the representation and generalization ability of a sparse shortcut topology. Specifically, we first demonstrate that this topology can empower a one-neuron-wide deep network to approximate any univariate continuous function. Then, we present a novel width-bounded universal approximator in contrast to depth-bounded universal approximators, and also extend the approximation result to a family of networks such that in the view of approximation ability, these networks are equally competent. Furthermore, we use the generalization bound theory to show that the investigated shortcut topology enjoys an excellent generalizability. Finally, we corroborate our theoretical analyses with experiments on some well-known benchmarks.
翻译:近些年来,深层次的学习已成为主流数据驱动的方法,以解决许多重要的现实世界问题。在成功的网络结构中,捷径连接已经牢固地确立,可以将早期层的产出作为附加投入纳入后层,从而产生极佳的结果。尽管捷径非常有效,但基本机制和相关功能方面仍然存在重要问题。例如,为什么捷径是强大的?为什么捷径是全面的?为了解决这些问题,我们研究了稀有的捷径表层的表达和概括能力。具体地说,我们首先证明,这一地形学可以使整个一中子深层网络能够接近任何单向连续功能。然后,我们提出了一个新的宽度通用的宽度辅助器,与深度的通用近似器形成对照,并将近似结果扩大到网络的大家庭,这样在近似能力看来,这些网络同样有能力。此外,我们使用笼统的归一的理论来表明,所调查的捷径表具有极强的可概括性。最后,我们用一些众所周知的基准实验证实了我们的理论分析。