Grasping the commonsense properties of everyday concepts is an important prerequisite to language understanding. While contextualised language models are reportedly capable of predicting such commonsense properties with human-level accuracy, we argue that such results have been inflated because of the high similarity between training and test concepts. This means that models which capture concept similarity can perform well, even if they do not capture any knowledge of the commonsense properties themselves. In settings where there is no overlap between the properties that are considered during training and testing, we find that the empirical performance of standard language models drops dramatically. To address this, we study the possibility of fine-tuning language models to explicitly model concepts and their properties. In particular, we train separate concept and property encoders on two types of readily available data: extracted hyponym-hypernym pairs and generic sentences. Our experimental results show that the resulting encoders allow us to predict commonsense properties with much higher accuracy than is possible by directly fine-tuning language models. We also present experimental results for the related task of unsupervised hypernym discovery.
翻译:剖析日常概念的常识特性是理解语言的重要先决条件。 虽然据说背景化语言模型能够预测这种常识性能,并具有人类水平的准确性,但我们认为,由于培训和测试概念之间高度相似,这些结果已经夸大。 这意味着,那些反映概念相似性的模型能够很好地发挥作用,即使它们不能捕捉到对常识性能本身的任何知识。在培训和测试期间所考虑的特性之间没有重叠的环境下,我们发现标准语言模型的经验性能急剧下降。为了解决这个问题,我们研究微调语言模型的可能性,以明确模型概念及其特性。特别是,我们用两种现成的数据来培训独立的概念和财产编码器:提取低温-超温配对和普通句子。我们的实验结果表明,所产生的编码器允许我们以比直接微调语言模型所能做到的更精确得多的预测常识性能。我们还提出了与未加控制的超温性能发现有关的任务的实验结果。