树木随机投射线直线化树木的预期边缘长度和预期边缘长度的线性时间计算 (Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees)

The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on languages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective permutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of $Zn$, where $n$ is the number of words of the sentence and $Z$ is the number of samples; the larger $Z$, the lower the error of the estimation but the larger the time cost. Here we present formulae to compute that expectation without error in time of the order of $n$. Furthermore, we show that star trees maximize it, and devise a dynamic programming algorithm to retrieve the trees that minimize it.

翻译：判决的合成结构往往使用合成依赖性树来表示。在过去几十年里,与同义词之间的距离总和已经出现在焦点中。对依赖性距离的研究导致制定了依赖性距离最小化原则,即命令在句子中用词最小化,以尽量减少这一总数。许多随机基线已经确定,以便对语言进行相关的定量研究。最简单的随机基线是该句中字句中不受限制的随机拼动的比值,即,当允许和同样可能对所有句子进行重整时。这里我们集中关注一个流行的基线:该句子词的随机投影变换,即,在对句子中单词进行随机投影式最小化,从而尽量减少该句子的偏差,即,在对句子中,正式限制往往用语言进行。因此,对一个句子随机投影的偏差总和,用蒙特卡洛程序估算出该句子的比值大约为1Zn美元,其中美元是判决的字数,这里的比值为美元,而Z值则以美元计算得更低的顺序,我们估计的比值要低的比值,我们更低的比值是比值的数值,我们更低的比值。