Recent work on predicting category structure with distributional models, using either static word embeddings (Heyman and Heyman, 2019) or contextualized language models (CLMs) (Misra et al., 2021), report low correlations with human ratings, thus calling into question their plausibility as models of human semantic memory. In this work, we revisit this question testing a wider array of methods for probing CLMs for predicting typicality scores. Our experiments, using BERT (Devlin et al., 2018), show the importance of using the right type of CLM probes, as our best BERT-based typicality prediction methods substantially improve over previous works. Second, our results highlight the importance of polysemy in this task: our best results are obtained when using a disambiguation mechanism. Finally, additional experiments reveal that Information Contentbased WordNet (Miller, 1995), also endowed with disambiguation, match the performance of the best BERT-based method, and in fact capture complementary information, which can be combined with BERT to achieve enhanced typicality predictions.
翻译:最近利用静态的单词嵌入模型(Heyman和 Heyman, 2019年)或背景化语言模型(CLMs)(Misra等人,2021年)对分布模型的分类结构进行预测的工作,报告了与人类评级的低相关性,从而使人们对其作为人类语义记忆模型的可信任性产生疑问。在这项工作中,我们重新研究这一问题,测试用于预测典型分数的更广泛的CLMS检验方法。我们使用BERT(Devlin等人,2018年)的实验表明,使用正确的CLM探测器类型非常重要,这是我们基于BERT的最佳典型预测方法。第二,我们的结果突出表明了多元性在这项任务中的重要性:在使用分辨机制时,我们获得了最佳的结果。最后,额外的实验表明,基于信息的WordNet(Miller,1995年),还带有不清晰性,与基于最佳的BERT方法的性能相匹配,事实上也捕捉到补充信息,这可以与BERT实现增强典型性预测相结合。