Psycholinguistic studies of human word processing and lexical access provide ample evidence of the preferred nature of word-initial versus word-final segments, e.g., in terms of attention paid by listeners (greater) or the likelihood of reduction by speakers (lower). This has led to the conjecture -- as in Wedel et al. (2019b), but common elsewhere -- that languages have evolved to provide more information earlier in words than later. Information-theoretic methods to establish such tendencies in lexicons have suffered from several methodological shortcomings that leave open the question of whether this high word-initial informativeness is actually a property of the lexicon or simply an artefact of the incremental nature of recognition. In this paper, we point out the confounds in existing methods for comparing the informativeness of segments early in the word versus later in the word, and present several new measures that avoid these confounds. When controlling for these confounds, we still find evidence across hundreds of languages that indeed there is a cross-linguistic tendency to front-load information in words.
翻译:对人类文字处理和词汇存取的心理语言学研究充分证明,单词初始部分和字终部部分的偏好性质,例如听众(大)的关注程度,或演讲者(低)减少的可能性。这导致了一种推测 -- -- 如Wedel等人(2019b),但其他地方常见 -- -- 语言已经演变,以更早的文字形式提供更多的信息。在词汇处理和词汇存取方面,建立这种倾向的信息理论方法受到若干方法缺陷的影响,使得这一高字初始信息化是否实际上属于词典的属性,还是仅仅是承认递增性质的精子。在本文中,我们指出在比较早期和较晚的词句中各部分信息性的现有方法的混乱之处,并提出避免这些纠结的若干新措施。在控制这些汇合时,我们仍发现数百种语言中的证据,确实存在着语言前载信息的交叉语言倾向。