Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous problems, or interpolating the loss landscape with techniques that typically require additional solver calls, introduce extra hyper-parameters, or compromise performance. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. Our experiments demonstrate that such a straightforward hyper-parameter-free approach is able to compete with previous more complex methods on numerous experiments such as backpropagation through discrete samplers, deep graph matching, and image retrieval. Furthermore, we substitute the previously proposed problem-specific and label-dependent margin with a generic regularization procedure that prevents cost collapse and increases robustness.
翻译:嵌入的离散求解器是不同的层层,具有现代深层学习结构的组合式直截面和离散推理能力。这些解答器的衍生物是零或未定义的,因此,一个有意义的替代对于有效的梯度学习至关重要。 先前的工程依赖于以输入扰动来平滑解答器,将解答器放松到连续的问题,或以通常需要额外求解器呼叫、引入额外超参数或折中性能的技术将损失地貌与损失地貌相互交错。 我们提议了一种原则性办法,利用离散解决方案空间的几何分法将解解答器当作后过关的负特性处理,并进一步提供理论上的理由。 我们的实验表明,这种直截的超度无参数方法能够与先前在通过离散采样器、深度图形匹配和图像检索等许多实验中的较复杂的方法竞争。 此外,我们用一种防止成本崩溃和增强坚固性的一般规范程序来取代先前提出的特定问题和依赖标签的边距。