The combinatorial power of language has historically been argued to be enabled by syntax: rules that allow words to combine hierarchically to convey complex meanings. But how important are these rules in practice? We performed a broad-coverage cross-linguistic investigation of the importance of grammatical cues for interpretation. First, English and Russian speakers (n=484) were presented with subjects, verbs, and objects (in random order and with morphological markings removed) extracted from naturally occurring sentences, and were asked to identify which noun is the agent of the action. Accuracy was high in both languages (~89% in English, ~87% in Russian), suggesting that word meanings strongly constrain who is doing what to whom. Next, we trained a neural network machine classifier on a similar task: predicting which nominal in a subject-verb-object triad is the subject. Across 30 languages from eight language families, performance was consistently high: a median accuracy of 87%, comparable to the accuracy observed in the human experiments. These results have ramifications for any theory of why languages look the way that they do, and seemingly pose a challenge for efficiency-based theories: why have grammatical cues for argument role if they only have utility in 10-15% of sentences? We suggest that although grammatical cues are not usually necessary, they are useful in the rare cases when the intended meaning cannot be inferred from the words alone, including descriptions of human interactions, where roles are often reversible (e.g., Ray helped Lu/Lu helped Ray), and expressing non-canonical meanings (e.g., the man bit the dog). Importantly, for such cues to be useful, they have to be reliable, which means being ubiquitously used, including when they are not needed.
翻译:语言的组合力量历来被认为是由语法所促成的: 允许语言在等级上结合的描述规则, 以传达复杂的含义。 但这些规则在实践中有多重要? 我们对语法提示的重要性进行了广泛的跨语言调查。 首先, 英语和俄语语言者( n=484) 展示了主题、 动词和从自然发生的句子中提取的物体( 随机顺序和形态标记被删除), 并被要求确定哪个词是动作的代言。 两种语言的拼写率都很高( 英文为~ 89%, 俄文为~ 87% ) 。 这表明, 这个词意味着强烈限制谁正在做什么。 接下来, 我们训练了一个神经网络分类器的类似任务: 预测主题- 动词和对象三角( 随机顺序和形态标记被删除 ) 。 在八种语言组中, 性能一直非常有用: 精度为87%, 与所观察到的精度相近, 有助于人类实验中的精度相比。 这些结果影响到任何语言的理论, 包括预想要表达的理论力,, 也就是: 10 。