We propose a new way of training neural networks, with the goal of reducing training cost. Our method uses approximate predicted gradients instead of the full gradients that require an expensive backward pass. We derive a control-variate-based technique that ensures our updates are unbiased estimates of the true gradient. Moreover, we propose a novel way to derive a predictor for the gradient inspired by the theory of the Neural Tangent Kernel. We empirically show the efficacy of the technique on a vision transformer classification task.
翻译:本文提出一种新的神经网络训练方法,旨在降低训练成本。该方法采用近似预测梯度替代需要昂贵反向传播计算的全梯度。我们推导出一种基于控制变量的技术,确保更新量是真实梯度的无偏估计。此外,受神经正切核理论启发,我们提出一种创新的梯度预测器构建方法。通过在视觉Transformer分类任务上的实证研究,验证了该技术的有效性。