Structured latent variables allow incorporating meaningful prior knowledge into deep learning models. However, learning with such variables remains challenging because of their discrete nature. Nowadays, the standard learning approach is to define a latent variable as a perturbed algorithm output and to use a differentiable surrogate for training. In general, the surrogate puts additional constraints on the model and inevitably leads to biased gradients. To alleviate these shortcomings, we extend the Gumbel-Max trick to define distributions over structured domains. We avoid the differentiable surrogates by leveraging the score function estimators for optimization. In particular, we highlight a family of recursive algorithms with a common feature we call stochastic invariant. The feature allows us to construct reliable gradient estimates and control variates without additional constraints on the model. In our experiments, we consider various structured latent variable models and achieve results competitive with relaxation-based counterparts.
翻译:结构性潜伏变量可以将有意义的先前知识纳入深层学习模式。 但是,与这些变量学习仍然具有挑战性,因为其性质离散。 如今,标准学习方法是将潜伏变量定义为一种扰动的算法输出,并使用一种不同的替代方法进行培训。 一般来说, 代用模型给模型增加了额外的限制, 并不可避免地导致偏差梯度。 为了减轻这些缺陷, 我们扩展了 Gumbel- Max 的伎俩, 以定义结构化域的分布。 我们通过利用分数估计符优化来避免差异的代用。 我们特别强调一系列循环算法, 其共同特征是我们称之为随机变异。 该特征使我们能够构建可靠的梯度估计和控制变异性, 而没有额外的限制。 在我们的实验中, 我们考虑了各种结构化的潜在变异模型, 并且与基于放松的对应方实现结果的竞争性。