将知识从愿景转移到语言:如何实现以及如何衡量知识? (Transferring Knowledge from Vision to Language: How to Achieve it and how to Measure it?)

Large language models are known to suffer from the hallucination problem in that they are prone to output statements that are false or inconsistent, indicating a lack of knowledge. A proposed solution to this is to provide the model with additional data modalities that complements the knowledge obtained through text. We investigate the use of visual data to complement the knowledge of large language models by proposing a method for evaluating visual knowledge transfer to text for uni- or multimodal language models. The method is based on two steps, 1) a novel task querying for knowledge of memory colors, i.e. typical colors of well-known objects, and 2) filtering of model training data to clearly separate knowledge contributions. Additionally, we introduce a model architecture that involves a visual imagination step and evaluate it with our proposed method. We find that our method can successfully be used to measure visual knowledge transfer capabilities in models and that our novel model architecture shows promising results for leveraging multimodal knowledge in a unimodal setting.

翻译：据了解,大型语言模型存在幻觉问题,因为它们容易出现虚假或不一致的输出语句,这表明缺乏知识。这方面的一个拟议解决办法是向模型提供补充通过文本获得的知识的额外数据模式。我们调查视觉数据的使用情况,以补充大语言模型的知识,方法是提出一种方法来评价视觉知识向单方或多式语言模型文本的转移。这种方法基于两个步骤:1)一项是调查记忆颜色知识的新任务,即众所周知的物体的典型颜色,2)过滤示范培训数据,以明确区分知识贡献。此外,我们引入一个包含视觉想象力步骤的模型结构,并用我们提议的方法对它进行评估。我们发现,我们的方法可以成功地用来衡量模型中的视觉知识转移能力,我们的新模型结构显示了在单式环境中利用多式联运知识的可喜成果。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/