无偏见和高效率地采样依赖性树木 (Unbiased and Efficient Sampling of Dependency Trees)

Distributions over spanning trees are the most common way of computational modeling of dependency syntax. However, most treebanks require that every valid dependency tree has a single edge coming out of the ROOT node, a constraint that is not part of the definition of spanning trees. For this reason all standard inference algorithms for spanning trees are sub-optimal for modeling dependency trees. Zmigrod et al. (2021b) have recently proposed algorithms for sampling with and without replacement from the single-root dependency tree distribution. In this paper we show that their fastest algorithm for sampling with replacement, Wilson-RC, is in fact producing biased samples and we provide two alternatives that are unbiased. Additionally, we propose two algorithms (one incremental, one parallel) that reduce the asymptotic runtime of their algorithm for sampling $k$ trees without replacement to $\mathcal{O}(kn^3)$. These algorithms are both asymptotically and practically more efficient.

翻译：横贯树木的分布分布是计算依赖性语法模型的最常见方式。然而,大多数树根都要求每棵有效的依赖性树从ROOT节流出一个单一的边缘,这不是树圈定义的一部分限制。因此,横贯树木的所有标准推算法都是模拟依赖性树的亚最佳方法。 Zmigrod 等人(2021b)最近提出了用单根依赖性树分布进行取样和不替换的算法。在本文中,我们表明,他们使用替代物取样的最快算法威尔逊-RC(Wilson-RC)实际上是产生有偏差的样本,我们提供了两种不带偏见的替代方法。此外,我们提出了两种算法(一个递增的,一个平行的),可以减少在不替换为$mathcal{O}(kn%3)美元的情况下采样的算法的无效力。这些算法既简单又实际上更有效率。