We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words on the MacArthur-Bates Communicative Development Inventory (Fenson et al., 2007). Drawing on studies of word acquisition in children, we evaluate multiple predictors for words' ages of acquisition in LSTMs, BERT, and GPT-2. We find that the effects of concreteness, word length, and lexical class are pointedly different in children and language models, reinforcing the importance of interaction and sensorimotor experience in child language acquisition. Language models rely far more on word frequency than children, but like children, they exhibit slower learning of words in longer utterances. Interestingly, models follow consistent patterns during training for both unidirectional and bidirectional models, and for both LSTM and Transformer architectures. Models predict based on unigram token frequencies early in training, before transitioning loosely to bigram probabilities, eventually converging on more nuanced predictions. These results shed light on the role of distributional learning mechanisms in children, while also providing insights for more human-like language acquisition in language models.
翻译:我们调查神经语言模型在培训过程中如何获得单词,在麦克阿瑟-Bates通信发展清单(Fenson等人,2007年)中提取学习曲线和600多字的学习年龄(MacArthur-Bates Communicational Development Convention)中如何在学习过程中获得单词,我们利用儿童获取字数的研究,在LSTMS、BERT和GPT-2中评估了多词获取年龄预测器。我们发现,具体性、字长和词汇类的影响在儿童和语言模型中明显不同,加强了互动和感官体验在获取儿童语言方面的重要性。语言模型比儿童更依赖文字频率,但像儿童一样,在较长的语句中学习速度较慢。有趣的是,在培训单向型和双向型模式以及LSTM和变形结构中,模型都遵循一致的模式。模型在早期培训中以单语象征频率为基础预测,在向大范围的概率过渡之前,最终在更精细的预测中相互融合。这些结果说明了儿童传播学习语言机制的作用,同时也为人语言学习模型。