In this paper, we consider Discretized Neural Networks (DNNs) consisting of low-precision weights and activations, which suffer from either infinite or zero gradients caused by the non-differentiable discrete function in the training process. In this case, most training-based DNNs use the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete value. However, the standard STE will cause the gradient mismatch problem, i.e., the approximated gradient direction may deviate from the steepest descent direction. In other words, the gradient mismatch implies the approximated gradient with perturbations. To address this problem, we introduce the duality theory to regard the perturbation of the approximated gradient as the perturbation of the metric in Linearly Nearly Euclidean (LNE) manifolds. Simultaneously, under the Ricci-DeTurck flow, we prove the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation, which can provide a theoretical solution for the gradient mismatch problem. In practice, we also present the steepest descent gradient flow for DNNs on LNE manifolds from the viewpoints of the information geometry and mirror descent. The experimental results on various datasets demonstrate that our method achieves better and more stable performance for DNNs than other representative training-based methods.
翻译:在本文中,我们考虑了由低精度重量和活化组成的分解神经网络(DNN),这些分解型神经网络(DNN)由低精度重量和活化组成,它们因培训过程中的无差异离散功能而产生无限或零梯度;在这种情况下,大多数基于培训的DNN(STE)使用标准的直线-透过偏离模拟器(STE)来估计渐变值;然而,标准STE(STE)将造成渐变错配错问题,即大约的梯度方向可能偏离最深的下降方向;换句话说,梯度错配错意味着与扰动相近的梯度或零梯度差。为了解决这一问题,我们引入双重理论理论理论,将大约梯度的扰动视为线性(Right-Trough Euclidean (LNE) 数。同时,在Rick-Dcurcrck流下,我们证明LNE(LNE)指标与 $L2$nturburbational-colbations)相向最接近的偏差梯值梯值变化, 也为当前渐变的梯值数据流提供更深的理论化方法。