带平行解码的端到端端端端端端端端密度视频说明 (End-to-End Dense Video Captioning with Parallel Decoding) - 专知论文

会员服务 ·

0

视频描述生成（Video Caption） · Extensibility · 端到端 · 解码 · Readability ·

2021 年 11 月 17 日

End-to-End Dense Video Captioning with Parallel Decoding

翻译：带平行解码的端到端端端端端端端端密度视频说明

Teng Wang,Ruimao Zhang,Zhichao Lu,Feng Zheng,Ran Cheng,Ping Luo

from arxiv, Accepted by ICCV 2021

Dense video captioning aims to generate multiple associated captions with their temporal locations from the video. Previous methods follow a sophisticated "localize-then-describe" scheme, which heavily relies on numerous hand-crafted components. In this paper, we proposed a simple yet effective framework for end-to-end dense video captioning with parallel decoding (PDVC), by formulating the dense caption generation as a set prediction task. In practice, through stacking a newly proposed event counter on the top of a transformer decoder, the PDVC precisely segments the video into a number of event pieces under the holistic understanding of the video content, which effectively increases the coherence and readability of predicted captions. Compared with prior arts, the PDVC has several appealing advantages: (1) Without relying on heuristic non-maximum suppression or a recurrent event sequence selection network to remove redundancy, PDVC directly produces an event set with an appropriate size; (2) In contrast to adopting the two-stage scheme, we feed the enhanced representations of event queries into the localization head and caption head in parallel, making these two sub-tasks deeply interrelated and mutually promoted through the optimization; (3) Without bells and whistles, extensive experiments on ActivityNet Captions and YouCook2 show that PDVC is capable of producing high-quality captioning results, surpassing the state-of-the-art two-stage methods when its localization accuracy is on par with them. Code is available at https://github.com/ttengwang/PDVC.

翻译：高密度视频字幕旨在从视频的时间位置生成多个相关字幕。以往的方法遵循复杂的“ 本地化- 现成编程” 方案, 大量依赖手工制作的组件。在本文中, 我们提出一个简单而有效的框架, 用于端到端密集的视频字幕, 并平行解码( PDVC ), 将密集的字幕生成作为设定的预测任务。在实践中, 通过在变压器解码器顶端堆放一个新提议的活动柜台, PDVC 将视频的精密部分分为多个事件片段, 在对视频内容的整体理解下, 有效地提高了预测的字幕的一致性和可读性。与先前的艺术相比, PDVC 方案有几种吸引人的优势:(1) 不依靠超高性压制或经常性事件序列选择网络来消除冗余性, PDVC 直接制作一个规模适当的活动; (2) 与采用两阶段化方案相比, 我们向本地化头和标题头提供强化的事件描述, 使这两个子片段能够密切关联和可读懂的字幕。

0

相关内容

视频描述生成（Video Caption）

视频描述生成（Video Caption）

视频描述生成（Video Caption），就是从视频中自动生成一段描述性文字

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【SIGIR2020-上海交大】一个深度循环生存模型的无偏排序，A Deep Recurrent Survival Model

【SIGIR2020-上海交大】一个深度循环生存模型的无偏排序，A Deep Recurrent Survival Model

专知会员服务

21+阅读 · 2020年5月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

已删除

将门创投

11+阅读 · 2019年8月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Open-book Video Captioning with Retrieve-Copy-Generate Network

Arxiv

7+阅读 · 2021年3月9日

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Arxiv

3+阅读 · 2020年3月12日

Streamlined Dense Video Captioning

Arxiv

7+阅读 · 2019年4月8日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

End-to-End Video Captioning with Multitask Reinforcement Learning

End-to-End Video Captioning with Multitask Reinforcement Learning

Arxiv

3+阅读 · 2019年1月1日

Jointly Localizing and Describing Events for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月23日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

视频描述生成（Video Caption）

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

【SIGIR2020-上海交大】一个深度循环生存模型的无偏排序，A Deep Recurrent Survival Model

【SIGIR2020-上海交大】一个深度循环生存模型的无偏排序，A Deep Recurrent Survival Model

专知会员服务

21+阅读 · 2020年5月3日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

【变分推断课件】Lectures on Variational Inference： Approximate Bayesian Inference in Machine Learning（附带pdf）

专知会员服务

35+阅读 · 2019年11月30日

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

Risk Sensitive Portfolio Optimization with Regime-Switching and Default Contagion，香港理工大学应用数学系余翔助理教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

10+阅读 · 2019年10月24日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

已删除

将门创投

11+阅读 · 2019年8月13日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

计算机视觉近一年进展综述

计算机视觉近一年进展综述

机器学习研究会

9+阅读 · 2017年11月25日

相关论文

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Open-book Video Captioning with Retrieve-Copy-Generate Network

Arxiv

7+阅读 · 2021年3月9日

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Object-Oriented Video Captioning with Temporal Graph and Prior Knowledge Building

Arxiv

3+阅读 · 2020年3月12日

Streamlined Dense Video Captioning

Arxiv

7+阅读 · 2019年4月8日

An End-to-End Baseline for Video Captioning

Arxiv

6+阅读 · 2019年4月4日

End-to-End Video Captioning with Multitask Reinforcement Learning

End-to-End Video Captioning with Multitask Reinforcement Learning

Arxiv

3+阅读 · 2019年1月1日

Jointly Localizing and Describing Events for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月23日

Learning to Guide Decoding for Image Captioning

Arxiv

6+阅读 · 2018年4月3日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员