通过培训前作为序列生成的统一事件探测和指挥 (Unifying Event Detection and Captioning as Sequence Generation via Pre-Training)

Dense video captioning aims to generate corresponding text descriptions for a series of events in the untrimmed video, which can be divided into two sub-tasks, event detection and event captioning. Unlike previous works that tackle the two sub-tasks separately, recent works have focused on enhancing the inter-task association between the two sub-tasks. However, designing inter-task interactions for event detection and captioning is not trivial due to the large differences in their task specific solutions. Besides, previous event detection methods normally ignore temporal dependencies between events, leading to event redundancy or inconsistency problems. To tackle above the two defects, in this paper, we define event detection as a sequence generation task and propose a unified pre-training and fine-tuning framework to naturally enhance the inter-task association between event detection and captioning. Since the model predicts each event with previous events as context, the inter-dependency between events is fully exploited and thus our model can detect more diverse and consistent events in the video. Experiments on the ActivityNet dataset show that our model outperforms the state-of-the-art methods, and can be further boosted when pre-trained on extra large-scale video-text data. Code is available at \url{https://github.com/QiQAng/UEDVC}.

翻译：内容繁多的视频字幕旨在为未剪辑的视频中的一系列事件生成相应的文本描述,这些描述可以分为两个子任务、事件探测和事件说明。与以前分别处理两个子任务的工作不同,最近的工作侧重于加强两个子任务之间的任务间联系。然而,设计用于事件探测和说明的跨任务互动并非微不足道,因为其具体任务解决方案存在巨大差异。此外,以往的事件探测方法通常忽略事件之间的时间依赖性,导致事件冗余或不一致问题。为了解决两个缺陷以上,我们在本文件中将事件探测定义为一个序列生成任务,并提议一个统一的训练前和调整框架,以自然地加强事件探测和说明之间的任务间联系。由于模型预测每个事件与以往事件的背景有相互依存关系,因此,事件之间的相互依存性得到了充分利用,因此我们的模型可以探测视频中更为多样和一致的事件。在活动网数据集上进行的实验显示,我们的模型比状态-艺术方法有差异或不一致的问题。在进行大规模前的图像A/Qqrxxx/ 上可以进一步推进。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日