A Deep Neural Network (DNN) is a composite function of vector-valued functions, and in order to train a DNN, it is necessary to calculate the gradient of the loss function with respect to all parameters. This calculation can be a non-trivial task because the loss function of a DNN is a composition of several nonlinear functions, each with numerous parameters. The Backpropagation (BP) algorithm leverages the composite structure of the DNN to efficiently compute the gradient. As a result, the number of layers in the network does not significantly impact the complexity of the calculation. The objective of this paper is to express the gradient of the loss function in terms of a matrix multiplication using the Jacobian operator. This can be achieved by considering the total derivative of each layer with respect to its parameters and expressing it as a Jacobian matrix. The gradient can then be represented as the matrix product of these Jacobian matrices. This approach is valid because the chain rule can be applied to a composition of vector-valued functions, and the use of Jacobian matrices allows for the incorporation of multiple inputs and outputs. By providing concise mathematical justifications, the results can be made understandable and useful to a broad audience from various disciplines.
翻译:深神经网络(DNN) 是矢量值函数的复合函数, 为了培训 DNN, 有必要计算所有参数的损失函数梯度。 此计算可能是一项非三重任务, 因为 DNN 的损失函数是由若干非线性函数构成的, 每个参数都有多个参数。 后演算法将DN 的复合结构用于有效计算梯度。 因此, 网络中的层数不会对计算的复杂性产生重大影响。 本文的目的是用Jacobian 操作员的矩阵乘法来表达损失函数的梯度。 可以通过考虑每个层的参数的总衍生物, 并将其表述为雅各布矩阵。 梯度可以作为这些雅各布矩阵的矩阵产物。 这种方法是有效的, 因为链规则可以适用于矢量值函数的构成, 使用Jacobian 矩阵可以将多种投入和产出纳入其中。 通过提供简明的数学解释, 其结果可以被理解, 并且对不同学科的受众来说是有用的。