The problem of identifying a probabilistic context free grammar has two aspects: the first is determining the grammar's topology (the rules of the grammar) and the second is estimating probabilistic weights for each rule. Given the hardness results for learning context-free grammars in general, and probabilistic grammars in particular, most of the literature has concentrated on the second problem. In this work we address the first problem. We restrict attention to structurally unambiguous weighted context-free grammars (SUWCFG) and provide a query learning algorithm for \structurally unambiguous probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be represented using \emph{co-linear multiplicity tree automata} (CMTA), and provide a polynomial learning algorithm that learns CMTAs. We show that the learned CMTA can be converted into a probabilistic grammar, thus providing a complete algorithm for learning a structurally unambiguous probabilistic context free grammar (both the grammar topology and the probabilistic weights) using structured membership queries and structured equivalence queries. A summarized version of this work was published at AAAI 21.
翻译:确定概率背景自由语法的问题有两个方面:第一个方面是确定语法的地形学(语法规则),第二个方面是估计每种规则的概率比重。鉴于学习无背景语法的难度,特别是概率语法的难度,大多数文献都集中在第二个问题上。在这项工作中,我们处理第一个问题。我们把注意力限制在结构上明确的加权无背景语法(SUWCFG)上,为无背景语法提供一种查询学习算法(SUPCFG ) 。我们显示,根据SUWCFG可以使用emph{co-linear 多重树本型语法(CMTA)来代表普通语法,并提供一种多语法学习算法,学习CMTA。我们表明,我们所学的CMTA可以转换成一种完整的算法,用于学习结构上明确的逻辑背景背景无背景无背景语法的无背景语法(SUPOCFGMA) 。我们显示,使用这一结构上的Asqmarialalalal AS 和结构式的A 结构式的Amarialalalalaltialal 查询, 包括了A 结构上的Amatigraphyalalal 。