This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions for piecewise linear functions, through the deduction from basic rules. The constructed solution should be universal enough to explain some network architectures of engineering; in order for that, several ways are provided to enhance the solution universality. Some of the consequences of our theories include: Under affine-geometry background, the solutions of both three-layer networks and deep-layer networks are given, particularly for those architectures applied in practice, such as multilayer feedforward neural networks and decoders; We give clear and intuitive interpretations of each component of network architectures; The parameter-sharing mechanism for multi-outputs is investigated; We provide an explanation of overparameterization solutions in terms of affine transforms; Under our framework, an advantage of deep layers compared to shallower ones is natural to be obtained. Some intermediate results are the basic knowledge for the modeling or understanding of neural networks, such as the classification of data embedded in higher-dimensional space, the generalization of affine transforms, the probabilistic model of matrix ranks, the concepts of distinguishable data sets and interference among hyperplanes, and so on.
翻译:本文的目的是通过从基本规则中扣除,通过探讨分形线性功能的解决方案,来解释Feedforward ReLU网络机制,从基本规则中扣除; 构建的解决方案应具有足够普遍性,足以解释某些工程网络结构; 为此,提供了若干方法,以加强解决方案的普遍性; 我们理论的一些后果包括: 在近距离地质学背景下,给出了三层网络和深层网络的解决方案,特别是用于实际应用的神经网络和分解器等结构的解决方案; 我们对网络结构的每个组成部分作出清晰和直观的解释; 对多输出结构的参数共享机制进行了调查; 我们从近距离变的角度解释了多参数共享解决方案; 在我们的框架内,自然可以获得深海层与浅层相比的优势; 一些中间成果是用于神经网络建模或理解的基本知识,例如对高维空间中嵌入的数据进行分类、对近距离变的概括化、对矩阵的概率模型模型和对等数据的干扰,以及可辨的数据集的概念。