了解大门在深层学习中的作用 (Neural Path Features and Neural Path Kernel : Understanding the role of gates in deep learning)

Rectified linear unit (ReLU) activations can also be thought of as 'gates', which, either pass or stop their pre-activation input when they are 'on' (when the pre-activation input is positive) or 'off' (when the pre-activation input is negative) respectively. A deep neural network (DNN) with ReLU activations has many gates, and the on/off status of each gate changes across input examples as well as network weights. For a given input example, only a subset of gates are 'active', i.e., on, and the sub-network of weights connected to these active gates is responsible for producing the output. At randomised initialisation, the active sub-network corresponding to a given input example is random. During training, as the weights are learnt, the active sub-networks are also learnt, and potentially hold very valuable information. In this paper, we analytically characterise the role of active sub-networks in deep learning. To this end, we encode the on/off state of the gates of a given input in a novel 'neural path feature' (NPF), and the weights of the DNN are encoded in a novel 'neural path value' (NPV). Further, we show that the output of network is indeed the inner product of NPF and NPV. The main result of the paper shows that the 'neural path kernel' associated with the NPF is a fundamental quantity that characterises the information stored in the gates of a DNN. We show via experiments (on MNIST and CIFAR-10) that in standard DNNs with ReLU activations NPFs are learnt during training and such learning is key for generalisation. Furthermore, NPFs and NPVs can be learnt in two separate networks and such learning also generalises well in experiments.

翻译：校正线性单元( ReLU) 激活也可以被分别视为“ 开关 ”, 在“ 开关” ( 启动前输入为正) 或“ 关闭 ” (启动前输入为负) 时, 或“ 关闭 ” (启动前输入为阴性) 。带有 ReLU 激活的深神经网络( DNNN) 有许多门, 每个门的开关变化以及网络重量。对于给定的输入示例, 只有一组门是“ 激活 ”, 也就是说, 与这些运行中门连接的端端端是“ 启动前输入为正 ” ( 当启动前输入为正阳性) 或“ 关闭关闭 ” (当启动前输入为阴性) 。在培训期间, 使用RLNLNNLU 启动后, 我们分析运行子网络在深处的作用。我们为此将NPV 的端/ 的端端端端端端端端端端端端端端端的端端端端端端端, 和连接的端端的端端端的端端端端端端的端端端端的端的端端端端端端端端端端端路径路径路径的端的端路径的端路径的端的端的端的端的端的端的端的端的端的端路径的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端端端端端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的端的