The primary use of any probabilistic model involving a set of random variables is to run inference and sampling queries on it. Inference queries in classical probabilistic models is concerned by the computation of marginal or conditional probabilities of events given as an input. When the probabilistic model is sequential, more sophisticated marginal inference queries involving complex grammars may be of interest in fields such as computational linguistics and NLP. In this work, we address the question of computing the likelihood of context-free grammars (CFGs) in Hidden Markov Models (HMMs). We provide a dynamic algorithm for the exact computation of the likelihood for the class of unambiguous context-free grammars. We show that the problem is NP-Hard, even with the promise that the input CFG has a degree of ambiguity less than or equal to 2. We then propose a fully polynomial randomized approximation scheme (FPRAS) algorithm to approximate the likelihood for the case of polynomially-bounded ambiguous CFGs.
翻译:任何涉及一组随机变量的概率模型的主要用途是对其进行推断和抽样查询。古典概率模型中的推断询问涉及作为输入的事件的边际或有条件概率的计算。当概率模型是相继的时,涉及复杂语法的更尖端边际推论询问可能对计算语言学和NLP等领域感兴趣。在这项工作中,我们处理的是计算隐性马尔科夫模型中无上下文语法的可能性的问题。我们为准确计算明确无上下文语法类别的可能性提供了动态算法。我们表明,问题在于NP-Hard,即使承诺输入的CFG的模糊度小于或等于2。我们然后提出一个完全多数值随机近似法的算法,以近似于多球范围模糊的CFG的可能性。