This paper focuses on training implicit models of infinite layers. Specifically, previous works employ implicit differentiation and solve the exact gradient for the backward propagation. However, is it necessary to compute such an exact but expensive gradient for training? In this work, we propose a novel gradient estimate for implicit models, named phantom gradient, that 1) forgoes the costly computation of the exact gradient; and 2) provides an update direction empirically preferable to the implicit model training. We theoretically analyze the condition under which an ascent direction of the loss landscape could be found, and provide two specific instantiations of the phantom gradient based on the damped unrolling and Neumann series. Experiments on large-scale tasks demonstrate that these lightweight phantom gradients significantly accelerate the backward passes in training implicit models by roughly 1.7 times, and even boost the performance over approaches based on the exact gradient on ImageNet.
翻译:本文侧重于培训无限层的隐含模型。 具体地说, 先前的工程使用隐含差异, 并解决了后向传播的确切梯度。 但是, 是否有必要为培训计算出这样一个精确但昂贵的梯度? 在这项工作中, 我们提议对隐含模型( 名为幻影梯度 ) 进行新的梯度估计, 1 即放弃精确梯度的昂贵计算; 2) 提供了比隐含模式培训更可取的更新方向 。 我们从理论上分析了可以找到损失地貌的高度方向的条件, 并且提供了两种基于摇动不动和 Neumann 系列的幽灵梯度的具体速率。 大规模任务实验表明, 这些轻量的幽灵梯度梯度大大加快了培训隐含模型的后向传递速度, 大约1.7 次, 甚至根据图像网的精确梯度提高方法的性能 。