制作多式视频视频章节 (Multi-modal Video Chapter Generation)

Chapter generation becomes practical technique for online videos nowadays. The chapter breakpoints enable users to quickly find the parts they want and get the summative annotations. However, there is no public method and dataset for this task. To facilitate the research along this direction, we introduce a new dataset called Chapter-Gen, which consists of approximately 10k user-generated videos with annotated chapter information. Our data collection procedure is fast, scalable and does not require any additional manual annotation. On top of this dataset, we design an effective baseline specificlly for video chapters generation task. which captures two aspects of a video,including visual dynamics and narration text. It disentangles local and global video features for localization and title generation respectively. To parse the long video efficiently, a skip sliding window mechanism is designed to localize potential chapters. And a cross attention multi-modal fusion module is developed to aggregate local features for title generation. Our experiments demonstrate that the proposed framework achieves superior results over existing methods which illustrate that the method design for similar task cannot be transfered directly even after fine-tuning. Code and dataset are available at https://github.com/czt117/MVCG.

翻译：章断点使用户能够快速找到他们想要的部件,并获得附加说明。但是, 没有公开的方法和数据集。为了便于沿着这个方向进行研究, 我们引入了一个新的数据集, 叫做Capi- Gen, 由大约 10 k 个用户生成的带附加说明章节信息的视频组成。我们的数据收集程序是快速、可缩放的, 不需要额外的手工注释。在此数据集之上, 我们为视频章节生成任务设计了一个有效的基准特质。它可以捕捉视频章节生成的两个方面, 包括视觉动态和解析文本。它会分解本地和全球的视频特性, 分别用于本地化和标题生成。为了高效地分析长视频, 设计了一个跳过滑动窗口机制, 将潜在章节本地化。我们开发了一个交叉关注的多模式融合模块, 以汇总标题生成的本地特性。我们的实验证明, 拟议的框架取得了优于现有方法的优异效果, 这表明类似任务的方法设计即使在微调之后也无法直接转移。代码和数据集可以在 https://gimb.

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日