CLIP4IDC: CLIP 图像差异说明 (CLIP4IDC: CLIP for Image Difference Captioning)

Image Difference Captioning (IDC) aims at generating sentences to describe differences between two similar-looking images. Conventional approaches learn an IDC model with a pre-trained and usually frozen visual feature extractor. Accordingly, two major issues may arise: (1) a large domain gap usually exists between the pre-training datasets used for training such a visual encoder and that of the downstream IDC task, and (2) the visual feature extractor, when separately encoding two images, often does not effectively encode the visual changes between two images. Due to the excellent zero-shot performance of the recently proposed CLIP, we thus propose CLIP4IDC to transfer a CLIP model for the IDC task to address those issues. Different from directly fine-tuning CLIP to generate sentences, we introduce an adaptation training process to adapt CLIP's visual encoder to capture and align differences in image pairs based on the textual descriptions. Experiments on three IDC benchmark datasets, CLEVR-Change, Spot-the-Diff, and Image-Editing-Request, demonstrate the effectiveness of CLIP4IDC.

翻译：图像差异描述(IDC)旨在生成描述两种相近图像差异的句子。常规方法学习了IDC模型,具有预先训练的通常冻结的视觉特征提取器。因此,可能出现两个主要问题:(1)培训前数据集与下游IDC任务的培训前数据集之间通常存在巨大的领域差距,以及(2)视觉特征提取器,当单独编码两张图像时,往往无法有效地将两种图像之间的视觉变化编码起来。由于最近提议的CLIP的出色零射性表现,我们因此建议CLIP4IDC为IDC任务传输CLIP模型,以解决这些问题。不同于直接微调 CLIP生成句子,我们引入了适应培训程序,以调整CLIP的视觉编码,以根据文字描述捕捉和校准图像配对的差异。关于国际数据中心三个基准数据集(CLEVR-Change, Spoint-Diff)的实验,我们因此建议CLIP4IC4ID-Dimdiction-Sective)的实验,展示了CLIP4C4C的实效。

相关内容

IDC

关注 6

Interaction Design and Children是研究人员、教育工作者和实践者的首次国际会议，旨在分享包容性儿童中心设计、学习和互动领域的最新研究成果、创新方法和新技术。年会包括论文、专题介绍、发言者、讲习班、参与性设计经验以及讨论如何为儿童创造更好的互动经验。官网链接：http://idc.acm.org/2019/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

【干货书】深度学习合成数据，354页pdf，Synthetic Data for Deep Learning

专知会员服务

104+阅读 · 2022年2月10日

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日