Most computational models of dependency syntax consist of distributions over spanning trees. However, the majority of dependency treebanks require that every valid dependency tree has a single edge coming out of the ROOT node, a constraint that is not part of the definition of spanning trees. For this reason all standard inference algorithms for spanning trees are suboptimal for inference over dependency trees. Zmigrod et al. (2021b) proposed algorithms for sampling with and without replacement from the dependency tree distribution that incorporate the single-root constraint. In this paper we show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact producing biased samples and we provide two alternatives that are unbiased. Additionally, we propose two algorithms (one incremental, one parallel) that reduce the asymptotic runtime of algorithm for sampling k trees without replacement to O(kn3). These algorithms are both asymptotically and practically more efficient.
翻译:多数依赖性语法的计算模型都是分布在横贯树木上的分布。 但是,大多数依赖性树库都要求每棵有效的依赖性树从ROOT节流出一个单一的边缘,这一限制不属于横贯树木定义的一部分。因此,横贯树木的所有标准推算法对于推断依赖性树的推断不尽如人意。Zmigrod 等人(2021b) 提出了用于取样的算法,用含有单根限制的依赖性树分布取而不用替换。 在本文中,我们表明,他们使用替代物取样的速度最快,威尔逊-RC(Wilson-RC),实际上是产生偏差的样本,我们提供了两种不偏不倚的替代方法。此外,我们提出了两种算法(一种递增法,一种平行法),可以减少在不替换O(kn3)的情况下采样K树的算法的无规律运行时间。 这些算法既简单又实际效率更高。