量化视觉、语言和视语复杂性在动词习得中的作用 (Quantifying the Roles of Visual, Linguistic, and Visual-Linguistic Complexity in Verb Acquisition)

Children typically learn the meanings of nouns earlier than the meanings of verbs. However, it is unclear whether this asymmetry is a result of complexity in the visual structure of categories in the world to which language refers, the structure of language itself, or the interplay between the two sources of information. We quantitatively test these three hypotheses regarding early verb learning by employing visual and linguistic representations of words sourced from large-scale pre-trained artificial neural networks. Examining the structure of both visual and linguistic embedding spaces, we find, first, that the representation of verbs is generally more variable and less discriminable within domain than the representation of nouns. Second, we find that if only one learning instance per category is available, visual and linguistic representations are less well aligned in the verb system than in the noun system. However, in parallel with the course of human language development, if multiple learning instances per category are available, visual and linguistic representations become almost as well aligned in the verb system as in the noun system. Third, we compare the relative contributions of factors that may predict learning difficulty for individual words. A regression analysis reveals that visual variability is the strongest factor that internally drives verb learning, followed by visual-linguistic alignment and linguistic variability. Based on these results, we conclude that verb acquisition is influenced by all three sources of complexity, but that the variability of visual structure poses the most significant challenge for verb learning.

翻译：儿童通常比动词早习得名词的含义。然而，目前尚不清楚这种不对称性是由于语言所参考的世界中类别的视觉结构复杂度，还是语言本身的结构，或者是两者的相互作用。我们采用大规模预训练的人工神经网络在视觉和语言表示方面量化了这三个假说对早期动词学习的影响。通过研究视觉和语言嵌入空间的结构，我们发现，首先，与名词的表示相比，动词的表示在领域内通常更为变化和难区分。其次，我们发现，如果每个类别只有一个学习实例，那么动词系统中的视觉和语言表示比名词系统更分散。然而，与人类语言发展的过程类似，如果每个类别有多个学习实例，那么动词系统中的视觉和语言表示几乎与名词系统中的表示一样好。第三，我们比较了可以预测个别单词学习困难程度的因素的相对贡献。回归分析表明，视觉变异是推动动词学习的最强因素，其次是视语对齐和语言变异。基于这些结果，我们得出结论，动词习得受三种复杂性源的影响，但视觉结构的变异性是动词学习面临的最大挑战。