The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on languages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective permutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of $Rn$, where $n$ is the number of words of the sentence and $R$ is the number of samples; it is well known that the larger $R$, the lower the error of the estimation but the larger the time cost. Here we present formulae to compute that expectation without error in time of the order of $n$. Furthermore, we show that star trees maximize it, and give an algorithm to retrieve the trees that minimize it.
翻译:句子的合成结构通常使用合成依赖性树来表示句子的合成结构。在过去几十年里,与同义词之间的距离总和一直出现在焦点中。对依赖性距离的研究导致制定了依赖性距离最小化原则,即命令在句子中用词最小化,以尽量减少这一总数。许多随机基线已经确定,以便对语言进行相关的定量研究。最简单的随机基线是句子中字句中未受限制的随机拼动数的总和的预期值,即允许和同样的可能性。在这里,我们侧重于流行的基线:句子词的随机投影变异,即合成性依赖性结构是投影的,正式限制往往在语言中满足。迄今为止,对随机投影性拉动的某一句子中的依赖性距离总和的预期值大约是蒙特卡洛程序,其成本是美元,其中美元是判决的字数,而美元是流行基线的随机拼凑值,而美元是目前最低的直径的直径数。我们所知道的直径直值,而最低的直径直径直到最深的直径直径直径直到最深的图。