Probabilistic distributions over spanning trees in directed graphs are a fundamental model of dependency structure in natural language processing, syntactic dependency trees. In NLP, dependency trees often have an additional root constraint: only one edge may emanate from the root. However, no sampling algorithm has been presented in the literature to account for this additional constraint. In this paper, we adapt two spanning tree sampling algorithms to faithfully sample dependency trees from a graph subject to the root constraint. Wilson (1996)'s sampling algorithm has a running time of $\mathcal{O}(H)$ where $H$ is the mean hitting time of the graph. Colbourn (1996)'s sampling algorithm has a running time of $\mathcal{O}(N^3)$, which is often greater than the mean hitting time of a directed graph. Additionally, we build upon Colbourn's algorithm and present a novel extension that can sample $K$ trees without replacement in $\mathcal{O}(K N^3 + K^2 N)$ time. To the best of our knowledge, no algorithm has been given for sampling spanning trees without replacement from a directed graph.
翻译:在定向图解中,横贯树木的概率分布是自然语言处理中依赖性结构的基本模型,即合成依赖性树。在NLP中,依赖性树通常具有额外的根限限制:只有一种边缘可能来自根。然而,文献中没有提供取样算法来说明这一额外的限制。在本文中,我们调整了两个横贯树采样算法,以便忠实地从根限限制的图表中采样依赖性树。Wilson(1996年)的取样算法有一个运行时间$\mathcal{O}(H)的运行时间,其中H$是图中的平均打击时间。Colbourn(1996年)的采样算法有一个运行时间$\mathcal{O}(N3) (N3),这往往大于定向图中的平均打击时间。此外,我们利用Colbourn的算法,提出一个新的扩展法,可以在不替换$mathcal{O}(KN3+K%2N) 。据我们所知,没有给出采样测算法,而没有从图表中替换图状图状。