In this paper, we consider Discretized Neural Networks (DNNs) consisting of low-precision weights and activations, which suffer from either infinite or zero gradients caused by the non-differentiable discrete function in the training process. In this case, most training-based DNNs use the standard Straight-Through Estimator (STE) to approximate the gradient w.r.t. discrete values. However, the STE will cause the problem of gradient mismatch, which implies that the approximated gradient is with perturbations. We propose an analysis that this mismatch can be viewed as a metric perturbation in a Riemannian manifold through the lens of duality theory. To address this problem, based on the information geometry, we construct the Linearly Nearly Euclidean (LNE) manifold for DNNs as a background to deal with perturbations. By introducing a partial differential equation on metrics, the Ricci flow, we prove the dynamical stability and convergence of the LNE metric with the $L^2$-norm perturbation. And unlike the previous perturbation theory which gives the rate of convergence is the fractional powers, we yield the metric perturbation under the Ricci flow can be exponentially decayed in the LNE manifold. The experimental results on various datasets demonstrate that our method achieves better and more stable performance for DNNs than other representative training-based methods.
翻译:在本文中,我们考虑了分解神经网络(DNN)的问题,这些分解神经网络由低精度重量和活化组成,在培训过程中,由非差异的离散功能造成无限或零梯度,在培训过程中,这种偏差要么是无限或零梯度。在这种情况下,大多数基于培训的DNN(STE) 使用标准的直流-透析模拟器(STE) 来估计梯度(w.r.t. 离散)值。然而,STE 将造成梯度错配问题,这意味着大约的梯度是振动性的。我们提出分析,这种不匹配可以通过双重理论的视角,将Riemann式的多元体模型看成是宽度或零度的梯度。为了解决这一问题,我们根据信息地理测量方法,为DNNNNE(L) 设置的偏差度(LNE) 度(M) 与 $L_2 平调(NON) 实现双向下) 度(Ristrualation) 递增压(Ristral) ) 。我们以往的递化方法则不同于(Ristral) 的递增压(Ristral) 。</s>