通过 CLIP 引导群优化 (Distinctive Image Captioning via CLIP Guided Group Optimization) - 专知论文

会员服务 ·

0

图像字幕 · MoDELS · GROUP · 优化器 · state-of-the-art ·

2022 年 8 月 11 日

Distinctive Image Captioning via CLIP Guided Group Optimization

翻译：通过 CLIP 引导群优化

Youyuan Zhang,Jiuniu Wang,Hao Wu,Wenjia Xu

Image captioning models are usually trained according to human annotated ground-truth captions, which could generate accurate but generic captions. In this paper, we focus on generating the distinctive captions that can distinguish the target image from other similar images. To evaluate the distinctiveness of captions, we introduce a series of metrics that use large-scale vision-language pre-training model CLIP to quantify the distinctiveness. To further improve the distinctiveness of captioning models, we propose a simple and effective training strategy which trains the model by comparing target image with similar image group and optimizing the group embedding gap. Extensive experiments are conducted on various baseline models to demonstrate the wide applicability of our strategy and the consistency of metric results with human evaluation. By comparing the performance of our best model with existing state-of-the-art models, we claim that our model achieves new state-of-the-art towards distinctiveness objective.

翻译：图像字幕模型通常根据人类附加说明的地面真实说明进行训练,这些说明可以产生准确但通用的字幕。在本文中,我们侧重于制作能够将目标图像与其他类似图像区分开来的独特字幕。为了评估字幕的独特性,我们推出了一系列衡量尺度,使用大规模视觉语言预科培训模型CLIP来量化其独特性。为了进一步提高字幕模型的独特性,我们提出了一个简单有效的培训战略,通过将目标图像与类似图像组进行比较和优化组合嵌入差距来培训模型。对各种基线模型进行了广泛的实验,以展示我们的战略的广泛适用性以及衡量结果与人类评估的一致性。通过将我们最佳模型的性能与现有最新模型进行比较,我们声称我们的模型实现了与独特性目标的新状态。

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

hedgehog-Gli信号通路在骨髓基质细胞对急性髓系白血病细胞凋亡影响中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

卫星重力测量反演高精度高分辨率局部地表质量变化的方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

钛离子对T细胞内线粒体调控钙信号通路影响的研究

国家自然科学基金

0+阅读 · 2014年12月31日

RNA解旋酶Prp5的翻译后修饰及其调控剪接机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

稻瘟病菌cAMP-PKA信号途径下游基因调控网络及其功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

pH响应离子液体的溶液化学研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-145/PAK4/LIMK1调控通路介导结直肠癌肝转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

结直肠癌预后相关基因的功能及表达调控机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

RKIP介导的MEK/ERK信号通路在海马损伤中的作用及其锌调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于几何约束lifting技术的细分小波变换研究

国家自然科学基金

0+阅读 · 2009年12月31日

Improving Visual-Semantic Embedding with Adaptive Pooling and Optimization Objective

Arxiv

0+阅读 · 2022年10月5日

More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

Arxiv

0+阅读 · 2022年10月3日

KNN-Diffusion: Image Generation via Large-Scale Retrieval

Arxiv

0+阅读 · 2022年10月2日

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation

Arxiv

0+阅读 · 2022年9月30日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

2020数据工程师成长路线图

专知会员服务

19+阅读 · 2020年9月6日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

Keras François Chollet 《Deep Learning with Python 》, 386页pdf

专知会员服务

160+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

ACM MM 2022 Call for Papers

ACM MM 2022 Call for Papers

CCF多媒体专委会

5+阅读 · 2022年3月29日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium8

中国图象图形学学会CSIG

0+阅读 · 2021年11月16日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium4

中国图象图形学学会CSIG

0+阅读 · 2021年11月10日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium3

中国图象图形学学会CSIG

0+阅读 · 2021年11月9日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium1

中国图象图形学学会CSIG

0+阅读 · 2021年11月3日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

相关论文

Improving Visual-Semantic Embedding with Adaptive Pooling and Optimization Objective

Arxiv

0+阅读 · 2022年10月5日

More Than Just Attention: Improving Cross-Modal Attentions with Contrastive Constraints for Image-Text Matching

Arxiv

0+阅读 · 2022年10月3日

KNN-Diffusion: Image Generation via Large-Scale Retrieval

Arxiv

0+阅读 · 2022年10月2日

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation

Arxiv

0+阅读 · 2022年9月30日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

SlowFast Networks for Video Recognition

SlowFast Networks for Video Recognition

Arxiv

19+阅读 · 2018年12月10日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

CNN+CNN: Convolutional Decoders for Image Captioning

Arxiv

21+阅读 · 2018年5月23日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

相关基金

hedgehog-Gli信号通路在骨髓基质细胞对急性髓系白血病细胞凋亡影响中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

卫星重力测量反演高精度高分辨率局部地表质量变化的方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

钛离子对T细胞内线粒体调控钙信号通路影响的研究

国家自然科学基金

0+阅读 · 2014年12月31日

RNA解旋酶Prp5的翻译后修饰及其调控剪接机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

稻瘟病菌cAMP-PKA信号途径下游基因调控网络及其功能研究

国家自然科学基金

0+阅读 · 2012年12月31日

pH响应离子液体的溶液化学研究

国家自然科学基金

0+阅读 · 2012年12月31日

miR-145/PAK4/LIMK1调控通路介导结直肠癌肝转移的机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

结直肠癌预后相关基因的功能及表达调控机制的研究

国家自然科学基金

0+阅读 · 2009年12月31日

RKIP介导的MEK/ERK信号通路在海马损伤中的作用及其锌调控

国家自然科学基金

0+阅读 · 2009年12月31日

基于几何约束lifting技术的细分小波变换研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员