The integration of discrete algorithmic components in deep learning architectures has numerous applications. Recently, Implicit Maximum Likelihood Estimation (IMLE, Niepert, Minervini, and Franceschi 2021), a class of gradient estimators for discrete exponential family distributions, was proposed by combining implicit differentiation through perturbation with the path-wise gradient estimator. However, due to the finite difference approximation of the gradients, it is especially sensitive to the choice of the finite difference step size, which needs to be specified by the user. In this work, we present Adaptive IMLE (AIMLE), the first adaptive gradient estimator for complex discrete distributions: it adaptively identifies the target distribution for IMLE by trading off the density of gradient information with the degree of bias in the gradient estimates. We empirically evaluate our estimator on synthetic examples, as well as on Learning to Explain, Discrete Variational Auto-Encoders, and Neural Relational Inference tasks. In our experiments, we show that our adaptive gradient estimator can produce faithful estimates while requiring orders of magnitude fewer samples than other gradient estimators.
翻译:将离散的算法组成部分整合到深层学习结构中有许多应用。 最近,通过将离散指数型家庭分布的梯度估计器(IMLE、Niepert、Minervini和Franceschi 2021)这一类梯度估计器,提出了将隐含的差别与路径偏差梯度估计器相结合的建议。然而,由于梯度的有限差差近近值,它特别敏感于用户需要指定的有限差异一步大小的选择。在这项工作中,我们提出了适应性IMLE(AIMLE),这是用于复杂离散分布的第一个适应性梯度估计器:它通过将梯度信息的密度与梯度估计的偏差程度进行交换,从而适应性地确定了IMLE的目标分布。我们用经验来评估我们关于合成示例的估算器以及学习解释、差异性自动电算器和神经关系推算的估算器,我们在实验中显示,我们的适应性梯度测度定值比其他梯度定值要低的测算器能够产生准确的测算。