Prior research has explored the ability of computational models to predict a word semantic fit with a given predicate. While much work has been devoted to modeling the typicality relation between verbs and arguments in isolation, in this paper we take a broader perspective by assessing whether and to what extent computational approaches have access to the information about the typicality of entire events and situations described in language (Generalized Event Knowledge). Given the recent success of Transformers Language Models (TLMs), we decided to test them on a benchmark for the \textit{dynamic estimation of thematic fit}. The evaluation of these models was performed in comparison with SDM, a framework specifically designed to integrate events in sentence meaning representations, and we conducted a detailed error analysis to investigate which factors affect their behavior. Our results show that TLMs can reach performances that are comparable to those achieved by SDM. However, additional analysis consistently suggests that TLMs do not capture important aspects of event knowledge, and their predictions often depend on surface linguistic features, such as frequent words, collocations and syntactic patterns, thereby showing sub-optimal generalization abilities.
翻译:先前的研究探索了计算模型预测适合特定前提的词义语义的能力。虽然在将动词和参数的典型关系单独进行模型化方面做了大量工作,但在本文件中,我们从更广泛的角度评估计算方法是否以及在多大程度上能够获取关于语言(通用事件知识)所述全部事件和情况的典型性信息(通用事件知识),鉴于变异语言模型最近的成功,我们决定测试这些模型,以确定对主题适切性进行\textit{动态估计的基数。这些模型的评估是与SDM(一个专门设计用于将事件纳入句子表达方式的框架)相比进行的,我们进行了详细的错误分析,以调查哪些因素影响其行为。我们的结果表明,TLMS可以达到与SDM(通用事件知识)相似的性能。然而,进一步的分析一致表明,TLMS并不能够捕捉到重要的知识方面,其预测往往取决于表面语言特征,例如频繁的文字、相交点和合成模式,从而显示亚性一般化能力。