精炼的语义增强,以向视频字幕传播频率扩散 (Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning) - 专知论文

会员服务 ·

0

视频描述生成（Video Caption） · 词元分析器 · Extensibility · INFORMS · state-of-the-art ·

2022 年 12 月 18 日

Refined Semantic Enhancement towards Frequency Diffusion for Video Captioning

翻译：精炼的语义增强,以向视频字幕传播频率扩散

Xian Zhong,Zipeng Li,Shuqin Chen,Kui Jiang,Chen Chen,Mang Ye

Video captioning aims to generate natural language sentences that describe the given video accurately. Existing methods obtain favorable generation by exploring richer visual representations in encode phase or improving the decoding ability. However, the long-tailed problem hinders these attempts at low-frequency tokens, which rarely occur but carry critical semantics, playing a vital role in the detailed generation. In this paper, we introduce a novel Refined Semantic enhancement method towards Frequency Diffusion (RSFD), a captioning model that constantly perceives the linguistic representation of the infrequent tokens. Concretely, a Frequency-Aware Diffusion (FAD) module is proposed to comprehend the semantics of low-frequency tokens to break through generation limitations. In this way, the caption is refined by promoting the absorption of tokens with insufficient occurrence. Based on FAD, we design a Divergent Semantic Supervisor (DSS) module to compensate for the information loss of high-frequency tokens brought by the diffusion process, where the semantics of low-frequency tokens is further emphasized to alleviate the long-tailed problem. Extensive experiments indicate that RSFD outperforms the state-of-the-art methods on two benchmark datasets, i.e., MSR-VTT and MSVD, demonstrate that the enhancement of low-frequency tokens semantics can obtain a competitive generation effect. Code is available at https://github.com/lzp870/RSFD.

翻译：视频字幕旨在生成能够准确描述视频内容的自然语言句子。现有方法通过在编码阶段或提高解码能力中探索更丰富的视觉表达方式获得了有利的生成。然而,长期的问题阻碍了低频符号的这些尝试,这种尝试很少发生,但带有关键的语义,在详细一代中发挥着关键作用。在本文中,我们为频率扩散(RSFD)引入了一种新型精炼的语义强化方法,即频流传(RSFD),这是一种不断看到不常见符号语言代表的字幕模式。具体地说,建议了一个频率-Aware Difmission(FAD)模块来理解低频符号的语义表达方式,以打破生成限制。如此,通过促进吸收没有足够发生的代代代号来改进了字幕。根据FADD,我们设计了一个Dvergent Smantict 督导(DSS)模块,以补偿传播过程带来的高频符号信息损失,在此过程中,进一步强调低频符号的语义表达(FD)模块,以缓解长尾带问题。广泛的实验表明, RFDMFDD 的S- 的代SDDMex- 演示中的现有数据升级。

1

相关内容

视频描述生成（Video Caption）

视频描述生成（Video Caption）

视频描述生成（Video Caption），就是从视频中自动生成一段描述性文字

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

靶向抑制 MNK-eIF4E 轴增效TRAIL治疗鼻咽癌的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Zbed3在非小细胞肺癌发生发展中的作用及分子机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

PIM-1信号通路在非小细胞肺癌EGFR-TKI获得性耐药中的作用及其分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

冷胁迫诱导柽柳ThCAP基因表达的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

鸡毒支原体感染相关miRNAs鉴定及其分子调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

水稻VIP1同源基因的功能鉴定与应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

Neighbor Based Enhancement for the Long-Tail Ranking Problem in Video Rank Models

Arxiv

0+阅读 · 2023年2月16日

Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance

Arxiv

0+阅读 · 2023年2月16日

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

Arxiv

0+阅读 · 2023年2月15日

Video Probabilistic Diffusion Models in Projected Latent Space

Arxiv

2+阅读 · 2023年2月15日

Offline-to-Online Knowledge Distillation for Video Instance Segmentation

Arxiv

0+阅读 · 2023年2月15日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

VIP会员

文章信息

相关主题

视频描述生成（Video Caption）

词元分析器

state-of-the-art

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

美军“泰坦（TITAN）地面站目标系统”：是颠覆还是一场可预见的军事进步？

美空军指挥参谋学院 · 联合空中作战规划课程介绍（2025年） | 22页

一种基于视觉算法生成三维场景重建的多任务系统 | 2025最新200页

北约第十七届（2025年）网络冲突国际会议论文集 | 272页

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium9

中国图象图形学学会CSIG

0+阅读 · 2021年12月17日

【ICIG2021】Latest News & Announcements of the Industry Talk2

【ICIG2021】Latest News & Announcements of the Industry Talk2

中国图象图形学学会CSIG

0+阅读 · 2021年7月29日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

相关论文

Neighbor Based Enhancement for the Long-Tail Ranking Problem in Video Rank Models

Arxiv

0+阅读 · 2023年2月16日

Towards Reliable Image Outpainting: Learning Structure-Aware Multimodal Fusion with Depth Guidance

Arxiv

0+阅读 · 2023年2月16日

SparseFusion: Distilling View-conditioned Diffusion for 3D Reconstruction

Arxiv

0+阅读 · 2023年2月15日

Video Probabilistic Diffusion Models in Projected Latent Space

Arxiv

2+阅读 · 2023年2月15日

Offline-to-Online Knowledge Distillation for Video Instance Segmentation

Arxiv

0+阅读 · 2023年2月15日

From Show to Tell: A Survey on Image Captioning

Arxiv

15+阅读 · 2021年7月14日

Towards Open World Object Detection

Arxiv

13+阅读 · 2021年3月3日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Exploring Visual Relationship for Image Captioning

Exploring Visual Relationship for Image Captioning

Arxiv

15+阅读 · 2018年9月19日

Video Captioning via Hierarchical Reinforcement Learning

Arxiv

20+阅读 · 2018年3月29日

相关基金

靶向抑制 MNK-eIF4E 轴增效TRAIL治疗鼻咽癌的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

Zbed3在非小细胞肺癌发生发展中的作用及分子机制的研究

国家自然科学基金

0+阅读 · 2014年12月31日

PIM-1信号通路在非小细胞肺癌EGFR-TKI获得性耐药中的作用及其分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

冷胁迫诱导柽柳ThCAP基因表达的分子机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

CuInGaSe2太阳能电池界面结构、界面态及其钝化

国家自然科学基金

0+阅读 · 2012年12月31日

实时安全关键系统的建模、仿真与验证

国家自然科学基金

1+阅读 · 2012年12月31日

鸡毒支原体感染相关miRNAs鉴定及其分子调控机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

survivin拮抗细胞衰老的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

miR-124和miR-27对阿尔茨海默病BACE1基因影响的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

水稻VIP1同源基因的功能鉴定与应用研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员