Deep discrete structured models have seen considerable progress recently, but traditional inference using dynamic programming (DP) typically works with a small number of states (less than hundreds), which severely limits model capacity. At the same time, across machine learning, there is a recent trend of using randomized truncation techniques to accelerate computations involving large sums. Here, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference (partition, marginal, reparameterization, entropy) and different graph structures (chains, trees, and more general hypergraphs). It is also compatible with automatic differentiation: it can be integrated with neural networks seamlessly and learned with gradient-based optimizers. Our core technique approximates the sum-product by restricting and reweighting DP on a small subset of nodes, which reduces computation by orders of magnitude. We further achieve low bias and variance via Rao-Blackwellization and importance sampling. Experiments over different graphs demonstrate the accuracy and efficiency of our approach. Furthermore, when using RDP for training a structured variational autoencoder with a scaled inference network, we achieve better test likelihood than baselines and successfully prevent posterior collapse. code at: https://github.com/FranxYao/RDP
翻译:使用动态编程(DP)的传统推论通常与少数国家(不到几百个)合作,严重限制了模型能力。与此同时,在机器学习中,最近出现了一种趋势,即使用随机的脱线技术加速计算涉及大笔金额的计算。在这里,我们提出一套随机的动态编程算法(RDP)算法,将结构化模型缩放到数万个潜伏状态。我们的方法广泛适用于传统的基于DP的推论(分割、边际、再分计、增缩)和不同的图形结构(链、树和较一般的高压),这严重限制了模型能力。它也与自动区分兼容:它可以与神经网络无缝地结合,并与基于梯度优化的优化器一起学习。我们的核心技术通过限制和重新加权一个小的节点组合来将DP(RDP)相近于数量级的计算。我们通过Rao-Bricwellization和重要性取样进一步达到低偏差和差异。对不同图形的实验显示了我们的方法的准确性和效率。此外,当我们利用RDP的网络进行结构化测试时,我们比RDP更精确地进行了系统化的系统化。