Embedding discrete solvers as differentiable layers has given modern deep learning architectures combinatorial expressivity and discrete reasoning capabilities. The derivative of these solvers is zero or undefined, therefore a meaningful replacement is crucial for effective gradient-based learning. Prior works rely on smoothing the solver with input perturbations, relaxing the solver to continuous problems, or interpolating the loss landscape with techniques that typically require additional solver calls, introduce extra hyper-parameters or compromise performance. We propose a principled approach to exploit the geometry of the discrete solution space to treat the solver as a negative identity on the backward pass and further provide a theoretical justification. Our experiments demonstrate that such a straightforward hyper-parameter-free approach is on-par with or outperforms previous more complex methods on numerous experiments such as Traveling Salesman Problem, Shortest Path, Deep Graph Matching, and backpropagating through discrete samplers. Furthermore, we substitute the previously proposed problem-specific and label-dependent margin by a generic regularization procedure that prevents cost collapse and increases robustness.
翻译:嵌入的离散求解器是不同的层层,它们给现代深层学习结构提供了组合式直截面和离散推理能力。这些解答器的衍生物是零或未定义的,因此,一个有意义的替代对于有效的梯度学习至关重要。 先前的工程依赖于以输入扰动来平滑解答器,将解答器放松到连续的问题,或以通常需要额外求解器呼叫的技术将损失地貌与损失地貌相互交错,引入额外的超参数或妥协性性能。 我们提出一个原则性办法,利用离散解决方案空间的几何分法将解解解答器作为后关的负身份处理,并进一步提供理论上的理由。 我们的实验表明,这种直截的无超参数方法与以往许多实验(如旅行销售员问题、最短路径、深图表匹配,以及通过离散采样器进行反射等)的更复杂方法是平行的,我们用一个防止成本崩溃和增强稳健性的一般规范程序来取代先前提出的特定问题和依赖标签的边距。