This paper aims to interpret the mechanism of feedforward ReLU networks by exploring their solutions for piecewise linear functions through basic rules. The constructed solutions should be universal enough to explain the network architectures of engineering. In order for that, we borrow the methodology of theoretical physics to develop the theories. Some of the consequences of our theories include: Under geometric backgrounds, the solutions of both three-layer networks and deep-layer networks are presented, and the solution universality is ensured by several ways; We give clear and intuitive interpretations of each component of network architectures, such as the parameter-sharing mechanism for multi-output, the function of each layer, the advantage of deep layers, the redundancy of parameters, and so on. We explain three typical network architectures: the subnetwork of last three layers of convolutional networks, multi-layer feedforward networks, and the decoder of autoencoders. This paper is expected to provide a basic foundation of theories of feedforward ReLU networks for further investigations.
翻译:本文旨在通过探讨其通过基本规则对细线函数的分线函数的解决方案来解释Feedforward ReLU网络机制。 构建的解决方案应该具有普遍性,足以解释工程网络结构。 为此,我们借用理论物理方法来发展理论。 我们理论的一些后果包括: 在几何背景下,提出了三层网络和深层网络的解决方案,而解决方案的普遍性通过几种方式得到保证; 我们对网络结构的每个组成部分,例如多输出的参数共享机制、每一层的功能、深层的优势、参数的冗余等等,给出了清晰和直观的解释。 我们解释了三种典型的网络结构:最后三层革命网络的子网络、多层向向前网络和自动编码器的解码器。 本文预计将为进向RLU网络的理论提供一个基础,供进一步调查。