This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions for piecewise linear functions, through the deduction from basic rules. The constructed solution should be universal enough to explain some network architectures of engineering; in order for that, several ways are provided to enhance the solution universality. Some of the consequences of our theories include: Under affine-geometry background, the solutions of both three-layer networks and deep-layer networks are given, particularly for those architectures applied in practice, such as multilayer feedforward neural networks and decoders; We give clear and intuitive interpretations of each component of network architectures; The parameter-sharing mechanism for multi-outputs is investigated; We provide an explanation of overparameterization solutions in terms of affine transforms; Under our framework, an advantage of deep layers compared to shallower ones is natural to be obtained. Some intermediate results are the basic knowledge for the modeling or understanding of neural networks, such as the classification of data embedded in a higher-dimensional space, the generalization of affine transforms, the probabilistic model of matrix ranks, and the concept of distinguishable data sets.
翻译:本文旨在通过从基本规则中扣除,探讨分线功能的解决方案,从而解释进化再定位网络的机制。构建的解决方案应具有广泛性,足以解释工程的某些网络结构;为此,提供了几种方法,以加强解决方案的普遍性。我们理论的一些后果包括:在亲子-大地测量背景下,给出了三层网络和深层网络的解决方案,特别是用于实际应用的建筑,如多层向导神经网络和分解器; 我们对网络结构的每个组成部分作了清晰和直观的解释;对多产出的参数共享机制进行了调查; 我们从近距离变换的角度解释了多参数共享解决方案; 在我们的框架内,自然获得了深层与浅层相比的优势; 一些中间结果为建模或理解神经网络的基本知识,如高层次空间内嵌入的数据的分类、近距离变的概括化、矩阵的概率模型、以及可辨的数据集概念。