We propose a novel approach to addressing the vanishing (or exploding) gradient problem in deep neural networks. We construct a new architecture for deep neural networks where all layers (except the output layer) of the network are a combination of rotation, permutation, diagonal, and activation sublayers which are all volume preserving. Our approach replaces the standard weight matrix of a neural network with a combination of diagonal, rotational and permutation matrices, all of which are volume-preserving. We introduce a coupled activation function allowing us to preserve volume even in the activation function portion of a neural network layer. This control on the volume forces the gradient (on average) to maintain equilibrium and not explode or vanish. To demonstrate our architecture we apply our volume-preserving neural network model to two standard datasets.
翻译:我们提出了解决深神经网络中消失(或爆炸)梯度问题的新办法。 我们为深神经网络建造了一个新的结构,其中网络的所有层(除输出层之外)都是旋转、变异、对角和激活子层的组合,它们都是体积保护的。 我们的方法是用双向、旋转和变异矩阵的组合来取代神经网络的标准重量矩阵,所有这些矩阵都是数量保护的。 我们引入了一个同时激活功能,允许我们在神经网络层的激活功能部分中保持体积。 这种对体积的控制迫使(平均)梯度保持平衡,而不是爆炸或消失。 为了展示我们的结构,我们将量保护神经网络模型应用于两个标准数据集。