The scale of the state space of discrete graphical models is crucial for model capacity in the era of deep learning. Existing dynamic programming (DP) based inference typically works with a small number of states (usually less than hundreds). In this work, we propose a family of randomized dynamic programming (RDP) algorithms for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference (partition, marginal, reparameterization, entropy, .etc) and different graph structures (chains, trees, and more general hypergraphs). It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly and learned with gradient-based optimizers. Our core technique is randomization, which is to restrict and reweight DP on a small selected subset of nodes, leading to computation reduction by orders of magnitudes. We further achieve low bias and variance with Rao-Blackwellization and importance sampling. Experiments on different inferences over different graphs demonstrate the accuracy and efficiency of our methods. Furthermore, when using RDP to train a scaled structured VAE, it outperforms baselines in terms of test likelihood and successfully prevents posterior collapse.
翻译:离散图形模型的状态空间规模对于深层次学习时代的模型能力至关重要。现有的动态编程(DP)依据推论通常与少数国家(通常不到几百个国家)一起工作。在这项工作中,我们提出一组随机的动态编程(RDP)算法,以将结构模型缩放到数万个潜伏状态。我们的方法广泛适用于传统的基于DP的推论(分割、边际、再分计、英特比、.etc)和不同的图形结构(链、树和较一般的超强),这也与自动区分兼容,因此可以与神经网络无缝地结合,并与基于梯度的优化器一起学习。我们的核心技术是随机化,即限制和重新加权一个小的节点组,导致按数量顺序进行削减。我们进一步在光度和重要性取样方面实现低偏差和差异。对不同图表的不同推论的实验显示了我们方法的准确性和效率。此外,在使用RDP来培训结构化VAE时,它成功地防止了结构的崩溃性基准。