用于图像说明的语义- 有条件传播网络 (Semantic-Conditional Diffusion Networks for Image Captioning) - 专知论文

会员服务 ·

0

图像字幕 · MoDELS · Networking · Learning · Processing（编程语言） ·

2022 年 12 月 6 日

Semantic-Conditional Diffusion Networks for Image Captioning

翻译：用于图像说明的语义- 有条件传播网络

Jianjie Luo,Yehao Li,Yingwei Pan,Ting Yao,Jianlin Feng,Hongyang Chao,Tao Mei

from arxiv, Source code is available at \url{https://github.com/YehLi/xmodaler/tree/master/configs/image_caption/scdnet}

Recent advances on text-to-image generation have witnessed the rise of diffusion models which act as powerful generative models. Nevertheless, it is not trivial to exploit such latent variable models to capture the dependency among discrete words and meanwhile pursue complex visual-language alignment in image captioning. In this paper, we break the deeply rooted conventions in learning Transformer-based encoder-decoder, and propose a new diffusion model based paradigm tailored for image captioning, namely Semantic-Conditional Diffusion Networks (SCD-Net). Technically, for each input image, we first search the semantically relevant sentences via cross-modal retrieval model to convey the comprehensive semantic information. The rich semantics are further regarded as semantic prior to trigger the learning of Diffusion Transformer, which produces the output sentence in a diffusion process. In SCD-Net, multiple Diffusion Transformer structures are stacked to progressively strengthen the output sentence with better visional-language alignment and linguistical coherence in a cascaded manner. Furthermore, to stabilize the diffusion process, a new self-critical sequence training strategy is designed to guide the learning of SCD-Net with the knowledge of a standard autoregressive Transformer model. Extensive experiments on COCO dataset demonstrate the promising potential of using diffusion models in the challenging image captioning task. Source code is available at \url{https://github.com/YehLi/xmodaler/tree/master/configs/image_caption/scdnet}.

翻译：文本到图像生成的最新进步见证了作为强大基因化模型的传播模型的崛起。然而,利用这些潜伏变量模型来捕捉离散单词之间的依赖性,同时在图像字幕中追求复杂的视觉语言调整。在本文中,我们在学习基于变异器的编码解码器(Decoder-decoder)的过程中打破了根深蒂固的常规,并提议了一个新的基于图像字幕的传播模型模式,即Semantic-Sotitional Difnational Net(SCD-Net) 。从技术上讲,我们首先通过跨模式检索模型来查找具有语义相关性的句子,以传递全面的语义信息。丰富的语义学学被进一步视为语义学学,以启动在扩散过程中产生输出句的Difmult 变异变器的学习。在SCD-Net中,多种Difmultion Gerverer 结构堆叠起来,以逐步加强产出句子,以更好的视觉语言调和语言调调的方式。此外,为了稳定传播过程,新的自我批评的序列分析分析策略培训策略战略,在SCD-devicreglal lial listrational listrational livestidustrual liglemental deal deal lixal lixal liftal laftal taduction sal liftal laftal laftital sal sal laftal laftmal laftmex sil sil sil sal lemental 上,设计了Sildalking labild

0

相关内容

图像字幕

图像字幕（Image Captioning）,是指从图像生成文本描述的过程，主要根据图像中物体和物体的动作。

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

湿滑条件下仿蝾螈脚掌微结构的弹性体足垫壁面粘附机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

离子液体中Fe3O4多级结构的构筑及磁性研究

国家自然科学基金

0+阅读 · 2013年12月31日

化学气相沉积法制备石墨烯生长机理的计算机模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

导电聚合物-贵金属纳米复合材料的原位可控合成及电催化性能

国家自然科学基金

0+阅读 · 2012年12月31日

癌症的靶向基因 - 痘苗溶瘤病毒治疗策略

国家自然科学基金

1+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的敦煌壁画的模拟与渲染

国家自然科学基金

0+阅读 · 2012年12月31日

关联体系中电荷有序的原位同步辐射表征

国家自然科学基金

0+阅读 · 2011年12月31日

噻二唑类金属配合物的合成、表征及电致发光性能研究

国家自然科学基金

0+阅读 · 2008年12月31日

KENGIC: KEyword-driven and N-Gram Graph based Image Captioning

Arxiv

0+阅读 · 2023年2月7日

Eliminating Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion

Arxiv

0+阅读 · 2023年2月7日

CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis

Arxiv

0+阅读 · 2023年2月6日

Backdoor Attacks on Time Series: A Generative Approach

Arxiv

0+阅读 · 2023年2月5日

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Compositional GAN: Learning Conditional Image Composition

Compositional GAN: Learning Conditional Image Composition

Arxiv

31+阅读 · 2018年7月19日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Image Captioning using Deep Neural Architectures

Arxiv

20+阅读 · 2018年1月17日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

Processing（编程语言）

相关VIP内容

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

【CVPR 2022】基于粗粒度和细粒度特征匹配的视频描述评估，EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching

专知会员服务

10+阅读 · 2022年3月19日

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

【ACL2020】用于生成深度问题的语义图，Semantic Graphs for Generating Deep Questions

专知会员服务

26+阅读 · 2020年5月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【普林斯顿博士论文】在线学习：优化、控制与学习理论

不确定环境下无人机三维路径规划研究 | 221页

【NeurIPS2025】《LeapFactual：基于条件流匹配的可靠视觉反事实解释》

大语言模型将如何改变军事指挥结构

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Tutorial

【ICIG2021】Latest News & Announcements of the Tutorial

中国图象图形学学会CSIG

3+阅读 · 2021年12月20日

【ICIG2021】Latest News & Announcements of the Workshop

【ICIG2021】Latest News & Announcements of the Workshop

中国图象图形学学会CSIG

0+阅读 · 2021年12月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

【论文推荐】最新七篇图像分割相关论文—域适应深度表示学习、循环残差卷积、二值分割、图像合成、无监督跨模态

专知

19+阅读 · 2018年6月1日

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

【论文推荐】最新七篇图像检索相关论文—草图、Tie-Aware、场景图解析、叠加跨注意力机制、深度哈希、人群估计

专知

10+阅读 · 2018年4月22日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

KENGIC: KEyword-driven and N-Gram Graph based Image Captioning

Arxiv

0+阅读 · 2023年2月7日

Eliminating Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion

Arxiv

0+阅读 · 2023年2月7日

CHIMLE: Conditional Hierarchical IMLE for Multimodal Conditional Image Synthesis

Arxiv

0+阅读 · 2023年2月6日

Backdoor Attacks on Time Series: A Generative Approach

Arxiv

0+阅读 · 2023年2月5日

Question-controlled Text-aware Image Captioning

Arxiv

10+阅读 · 2021年8月4日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Compositional GAN: Learning Conditional Image Composition

Compositional GAN: Learning Conditional Image Composition

Arxiv

31+阅读 · 2018年7月19日

Image Captioning

Arxiv

11+阅读 · 2018年5月13日

Image Captioning using Deep Neural Architectures

Arxiv

20+阅读 · 2018年1月17日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

相关基金

Copine VII在阿尔茨海默病中的作用机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

湿滑条件下仿蝾螈脚掌微结构的弹性体足垫壁面粘附机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

离子液体中Fe3O4多级结构的构筑及磁性研究

国家自然科学基金

0+阅读 · 2013年12月31日

化学气相沉积法制备石墨烯生长机理的计算机模拟研究

国家自然科学基金

0+阅读 · 2012年12月31日

导电聚合物-贵金属纳米复合材料的原位可控合成及电催化性能

国家自然科学基金

0+阅读 · 2012年12月31日

癌症的靶向基因 - 痘苗溶瘤病毒治疗策略

国家自然科学基金

1+阅读 · 2012年12月31日

LIMK1：罗格列酮抑制人胃癌细胞增殖、迁移及侵袭的作用靶点

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的敦煌壁画的模拟与渲染

国家自然科学基金

0+阅读 · 2012年12月31日

关联体系中电荷有序的原位同步辐射表征

国家自然科学基金

0+阅读 · 2011年12月31日

噻二唑类金属配合物的合成、表征及电致发光性能研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员