显而易见的也许不是忠实的:在 " 视觉语言 " 培训前探明目标的幻觉 (Plausible May Not Be Faithful: Probing Object Hallucination in Vision-Language Pre-training)

Large-scale vision-language pre-trained (VLP) models are prone to hallucinate non-existent visual objects when generating text based on visual information. In this paper, we exhaustively probe the object hallucination problem from three aspects. First, we examine various state-of-the-art VLP models, showing that models achieving better scores on standard metrics(e.g., BLEU-4, CIDEr) could hallucinate objects more frequently. Second, we investigate how different types of visual features in VLP influence hallucination, including region-based, grid-based, and patch-based. Surprisingly, we find that patch-based features perform the best and smaller patch resolution yields a non-trivial reduction in object hallucination. Third, we decouple various VLP objectives and demonstrate their effectiveness in alleviating object hallucination. Based on that, we propose a new pre-training loss, object masked language modeling, to further reduce object hallucination. We evaluate models on both COCO (in-domain) and NoCaps (out-of-domain) datasets with our improved CHAIR metric. Furthermore, we investigate the effects of various text decoding strategies and image augmentation methods on object hallucination.

翻译：在根据视觉信息生成文本时,大规模视觉语言先行(VLP)模型容易产生幻觉,产生不存在的视觉物体。在本文中,我们从三个方面详尽地探究物体幻觉问题。首先,我们研究各种先进的VLP模型,显示在标准指标(如BLEU-4、CIDER)上取得更好分数的模型可以更频繁地产生幻觉。第二,我们调查VLP中不同类型的视觉特征如何影响幻觉,包括基于区域的、基于网格的和基于补丁的。令人惊讶的是,我们发现基于补补丁的特征能够产生最佳和较小的补丁解答,在目标幻觉方面产生非三、我们分解各种VLP目标,并展示其在减轻物体幻觉方面的效力。在此基础上,我们提议一种新的培训前损失、遮蔽语言模型,以进一步减少对象幻觉。我们用改进的CDCO(内部)和Cap(外部)数据模型来评估我们改进的CAIR模型和图像增强的图像的模型。此外,我们还要调查各种图像增强CHAIR矩阵的影响。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日