Neural language models encode rich knowledge about entities and their relationships which can be extracted from their representations using probing. Common properties of nouns (e.g., red strawberries, small ant) are, however, more challenging to extract compared to other types of knowledge because they are rarely explicitly stated in texts. We hypothesize this to mainly be the case for perceptual properties which are obvious to the participants in the communication. We propose to extract these properties from images and use them in an ensemble model, in order to complement the information that is extracted from language models. We consider perceptual properties to be more concrete than abstract properties (e.g., interesting, flawless). We propose to use the adjectives' concreteness score as a lever to calibrate the contribution of each source (text vs. images). We evaluate our ensemble model in a ranking task where the actual properties of a noun need to be ranked higher than other non-relevant properties. Our results show that the proposed combination of text and images greatly improves noun property prediction compared to powerful text-based language models.
翻译:神经语言模型将关于实体及其关系的丰富知识编成法典,可以通过测试从它们的陈述中提取。但是,与其他类型的知识相比,名人(例如红草莓、小蚂蚁)的共同特性(例如红草莓、小蚂蚁)比较起来,它们更难提取,因为它们很少在文本中明确表述。我们假设这主要适用于对通信参与者来说显而易见的感知属性。我们提议从图像中提取这些属性,并将其用成一个共同模型,以补充从语言模型中提取的信息。我们认为,名人(例如红草莓、小蚂蚁)的共同特性比抽象特性(例如有趣、无瑕疵)要具体得多。我们提议使用形容词的具体性评分作为调整每种来源贡献(文字与图像)的杠杆。我们在一个排位任务中评估我们的共性模型,因为一个无名者的实际特性需要比其他非相关特性高。我们的结果显示,拟议的文本和图像组合与强大的文本语言模型相比,大大改进了无名财产预测。