Grammatical cues are sometimes redundant with word meanings in natural language. For instance, English word order rules constrain the word order of a sentence like "The dog chewed the bone" even though the status of "dog" as agent and "bone" as patient can be inferred from world knowledge and plausibility. Quantifying how often this redundancy occurs, and how the level of redundancy varies across typologically diverse languages, can shed light on the function and evolution of grammar. To that end, we performed a behavioral experiment in English and Russian and a cross-linguistic computational analysis measuring the redundancy of grammatical cues in transitive clauses extracted from naturalistic text. English and Russian speakers (n=484) were presented with subjects, verbs, and objects (in random order and with morphological markings removed) extracted from naturally occurring sentences and were asked to identify which noun is the agent of the action. Accuracy was high in both languages (~89% in English, ~87% in Russian). Next, we trained a neural network machine classifier on a similar task: predicting which nominal in a subject-verb-object triad is the subject. Across 30 languages from eight language families, performance was consistently high: a median accuracy of 87%, comparable to the accuracy observed in the human experiments. The conclusion is that grammatical cues such as word order are necessary to convey agenthood and patienthood in only at most 10-15% of naturally occurring sentences; nevertheless, they can (a) provide an important source of redundancy and (b) are crucial for conveying intended meaning that cannot be inferred from the words alone, including descriptions of human interactions, where roles are often reversible (e.g., Ray helped Lu/Lu helped Ray), and expressing non-prototypical meanings (e.g., "The bone chewed the dog.").
翻译:语义暗示有时是多余的, 自然语言中有字义含义。 例如, 英文字母顺序规则限制诸如“ 狗咀嚼骨头” 等句子的字顺序。 尽管“狗” 作为代理和“ 骨” 作为病人的地位可以从世界知识和可信任性中推断出来。 量化这种冗余的频率, 以及裁断的程度如何因类型多样的语言而异, 能够说明语法的功能和演变。 为此, 我们用英语和俄语进行了行为实验, 并进行了跨语言的计算分析, 测量了自然语言文本中自然标志的语义提示的冗余。 英语和俄语的“狗”作为代理和“骨头”作为病人的地位可以从世界的知识和可信度中推断出来。 量化这种冗余的频率如何, 以及裁断的程度可以说明语义的大小。 ( y89% 和~ 87% ) 。 仅用英语 和俄罗斯语系的语义解释 。 在类似的任务中, 我们只训练了一个神经网络的缩略语义解释, 包括了一种直径直径直径直径直的实验, 。