Backpropagation, the cornerstone of deep learning, is limited to computing gradients solely for continuous variables. This limitation hinders various research on problems involving discrete latent variables. To address this issue, we propose a novel approach for approximating the gradient of parameters involved in generating discrete latent variables. First, we examine the widely used Straight-Through (ST) heuristic and demonstrate that it works as a first-order approximation of the gradient. Guided by our findings, we propose a novel method called ReinMax, which integrates Heun's Method, a second-order numerical method for solving ODEs, to approximate the gradient. Our method achieves second-order accuracy without requiring Hessian or other second-order derivatives. We conduct experiments on structured output prediction and unsupervised generative modeling tasks. Our results show that \ours brings consistent improvements over the state of the art, including ST and Straight-Through Gumbel-Softmax. Implementations are released at https://github.com/microsoft/ReinMax.
翻译:反向传播作为深度学习的基础,仅限于计算连续变量的梯度。这个局限性阻碍了涉及离散潜变量的各种问题的研究。为了解决这个问题,我们提出了一种用于逼近生成离散潜变量参数梯度的新方法。首先,我们研究了广泛使用的直通法(ST)启发式方法,并证明它作为梯度的一阶近似。在我们的发现指导下,我们提出了一种称为ReinMax的新方法,它集成了Heun方法,这是一种用于解决ODE的二阶数值方法,以逼近梯度。我们的方法实现了二阶精度,无需Hessian或其他二阶导数。我们在结构化输出预测和无监督生成建模任务上进行了实验。我们的结果显示,我们的方法相对于ST和直通Gumbel-Softmax等现有技术带来了稳定的改善。实现已发布在https://github.com/microsoft/ReinMax。