When acquiring syntax, children consistently choose hierarchical rules over competing non-hierarchical possibilities. Is this preference due to a learning bias for hierarchical structure, or due to more general biases that interact with hierarchical cues in children's linguistic input? We explore these possibilities by training LSTMs and Transformers - two types of neural networks without a hierarchical bias - on data similar in quantity and content to children's linguistic input: text from the CHILDES corpus. We then evaluate what these models have learned about English yes/no questions, a phenomenon for which hierarchical structure is crucial. We find that, though they perform well at capturing the surface statistics of child-directed speech (as measured by perplexity), both model types generalize in a way more consistent with an incorrect linear rule than the correct hierarchical rule. These results suggest that human-like generalization from text alone requires stronger biases than the general sequence-processing biases of standard neural network architectures.
翻译:在获得语法时,儿童总是选择等级规则,而不是相互竞争的非等级可能性。这种偏好是因为对等级结构的学习偏向,还是因为与儿童语言输入中的等级暗示发生互动的更普遍的偏向?我们通过培训LSTMS和变异器(两种没有等级偏见的神经网络)来探索这些可能性,它们的数据在数量和内容上与儿童语言输入的数据相似:来自儿童文体的文本。然后我们评估这些模型在英语是/否问题上学到了什么,这是一个等级结构至关重要的现象。我们发现,虽然这两种模式都很好地掌握了以儿童为主的言语的表面统计数据(以易懂性衡量),但这两种模式都比正确的等级规则更加符合不正确的线性规则。这些结果表明,单凭文字的人类一般化要求比标准的神经网络结构的一般顺序处理偏向更强烈的偏见。