Structured distributions, i.e. distributions over combinatorial spaces, are commonly used to learn latent probabilistic representations from observed data. However, scaling these models is bottlenecked by the high computational and memory complexity with respect to the size of the latent representations. Common models such as Hidden Markov Models (HMMs) and Probabilistic Context-Free Grammars (PCFGs) require time and space quadratic and cubic in the number of hidden states respectively. This work demonstrates a simple approach to reduce the computational and memory complexity of a large class of structured models. We show that by viewing the central inference step as a matrix-vector product and using a low-rank constraint, we can trade off model expressivity and speed via the rank. Experiments with neural parameterized structured models for language modeling, polyphonic music modeling, unsupervised grammar induction, and video modeling show that our approach matches the accuracy of standard models at large state spaces while providing practical speedups.
翻译:结构分布,即组合空间的分布,通常用于从观测到的数据中学习潜在的概率表示。但是,这些模型的缩放因与潜在表示体的大小有关的高计算和记忆复杂性而受阻。隐藏的Markov模型(HMMS)和概率-无环境语法模型(PCFGs)等常见模型分别要求时间和空间二次和立方数的隐藏状态数量。这项工作展示了一种简单的方法来降低一大批结构模型的计算和记忆复杂性。我们通过将中央推论步骤视为矩阵-摄像头产品,并使用低级限制来显示,我们可以通过等级来交换模型的表达性和速度。在语言建模、多功能音乐模型、不受监视的语法诱导和视频建模等神经参数结构模型的实验表明,我们的方法与大州空间标准模型的准确性相匹配,同时提供了实用的加速性。