Energy-based models, a.k.a. energy networks, perform inference by optimizing an energy function, typically parametrized by a neural network. This allows one to capture potentially complex relationships between inputs and outputs. To learn the parameters of the energy function, the solution to that optimization problem is typically fed into a loss function. The key challenge for training energy networks lies in computing loss gradients, as this typically requires argmin/argmax differentiation. In this paper, building upon a generalized notion of conjugate function, which replaces the usual bilinear pairing with a general energy function, we propose generalized Fenchel-Young losses, a natural loss construction for learning energy networks. Our losses enjoy many desirable properties and their gradients can be computed efficiently without argmin/argmax differentiation. We also prove the calibration of their excess risk in the case of linear-concave energies. We demonstrate our losses on multilabel classification and imitation learning tasks.
翻译:以能源为基础的模型(a.k.a.a.能源网络)通过优化能源功能进行推断,通常由神经网络进行平衡。这样可以捕捉到投入和产出之间潜在的复杂关系。为了了解能源功能的参数,最优化问题的解决办法通常被注入损失功能。培训能源网络的关键挑战在于计算损失梯度,因为这通常需要argmin/argmax的区分。在本文中,基于一个普遍的概念,即共生功能,用一般的能源功能取代通常的双线对齐,我们提出了普遍化的Fenchel-Young损失,这是学习能源网络的自然损失构造。我们的损失享有许多可取的特性,其梯度可以不因argmin/argmax的差别而有效计算。我们还证明了在线形组合能量方面对能源过度风险的校准。我们展示了在多标签分类和模拟学习任务方面的损失。