Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous problems, or interpolating the loss landscape with techniques that typically require additional solver calls, introduce extra hyper-parameters, or compromise performance. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. Our experiments demonstrate that such a straightforward hyper-parameter-free approach is able to compete with previous more complex methods on numerous experiments such as backpropagation through discrete samplers, deep graph matching, and image retrieval. Furthermore, we substitute the previously proposed problem-specific and label-dependent margin with a generic regularization procedure that prevents cost collapse and increases robustness.
翻译:将离散求解器作为可微的层嵌入到现代深度学习架构中,使其具有组合表达能力和离散推理能力。 然而,这类求解器的导数为零或未定义,因此需要一个有意义的替代方法,才能使梯度下降等优化算法有效。以往的方法依赖于利用输入扰动来平滑求解器、将求解器松弛为连续问题、或通过需要额外求解器调用、引入额外超参数或降低性能的技术对损失景观进行插值。 本文提出了一种基于离散解空间的几何性质的原则方法,将求解器在反向传播中视为负身份,并进一步提供了理论证明。我们的实验表明,这种简单的无需超参数的方法在多个实验(如反向传播离散采样器、深度图匹配和图像检索)中能够与以往更复杂的方法竞争。此外,我们使用通用的正则化程序替换了以前提出的问题特定和标签相关的边际,这可以防止成本崩溃并提高鲁棒性。