Henaki: 从开放域文字描述生成变长视频 (Phenaki: Variable Length Video Generation From Open Domain Textual Description) - 专知论文

会员服务 ·

0

词元分析器 · MoDELS · 离散化 · 泛化理论 · Better ·

2022 年 10 月 5 日

Phenaki: Variable Length Video Generation From Open Domain Textual Description

翻译：Henaki: 从开放域文字描述生成变长视频

Ruben Villegas,Mohammad Babaeizadeh,Pieter-Jan Kindermans,Hernan Moraldo,Han Zhang,Mohammad Taghi Saffar,Santiago Castro,Julius Kunze,Dumitru Erhan

We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small representation of discrete tokens. This tokenizer uses causal attention in time, which allows it to work with variable-length videos. To generate video tokens from text we are using a bidirectional masked transformer conditioned on pre-computed text tokens. The generated video tokens are subsequently de-tokenized to create the actual video. To address data issues, we demonstrate how joint training on a large corpus of image-text pairs as well as a smaller number of video-text examples can result in generalization beyond what is available in the video datasets. Compared to the previous video generation methods, Phenaki can generate arbitrary long videos conditioned on a sequence of prompts (i.e. time variable text or a story) in open domain. To the best of our knowledge, this is the first time a paper studies generating videos from time variable prompts. In addition, compared to the per-frame baselines, the proposed video encoder-decoder computes fewer tokens per video but results in better spatio-temporal consistency.

翻译：我们展示了Phenaki, 这个模型能够现实的视频合成, 具有一系列文字提示。从文本中生成视频特别具有挑战性, 因为计算成本、数量有限的高质量文本视频数据和视频的长度不同。为了解决这些问题, 我们引入了一种新的学习视频演示模式, 将视频压缩成一个小的离散符号代表。这个代谢器在时间上使用因果关注, 允许它与变长视频合作。为了从文本中生成视频符号, 我们使用的是双向遮盖变压器, 以预置文本符号为条件。生成的视频符号随后被降级以创建实际视频。为了解决数据问题, 我们展示了在大量图像配对以及较少的视频文本示例上进行联合培训的方式, 使得在视频数据集的现有内容之外进行概括化。与先前的视频生成方法相比, Phenaki可以生成任意的长视频符号, 以提示序列为条件( 即时间变换文本或故事) 。在开放域中, 生成一个比我们的最佳图像基准, 与每部的每部平时序生成一个最易变的图像研究。

0

相关内容

词元分析器

词元分析器

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

基于低秩与稀疏矩阵分离的视频合成孔径雷达 GMTI信号处理技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于腔内泵浦技术的蓝光泵浦Pr:KYF预调Q倍频紫外单频激光器机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

跨语图像检索中融合视觉信息的多语翻译与集成方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

TiO2光催化机理的多时间尺度研究

国家自然科学基金

0+阅读 · 2012年12月31日

融合深层语义理解的网络水军发帖自动检测技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

知识驱动的软件需求和体系结构文档的归档方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

高脂及肠道微生态代谢异常影响大肠癌发病风险的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Fuzzy Domain 理论及其新拓扑工具研究

国家自然科学基金

0+阅读 · 2010年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

利用GPS与IM/WS干涉测量监测鲜水河断层变形

国家自然科学基金

0+阅读 · 2008年12月31日

Class-attention Video Transformer for Engagement Intensity Prediction

Arxiv

0+阅读 · 2022年11月10日

Video Vision Transformers for Violence Detection

Arxiv

0+阅读 · 2022年11月10日

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

Arxiv

0+阅读 · 2022年11月9日

Bit-depth enhancement detection for compressed video

Bit-depth enhancement detection for compressed video

Arxiv

0+阅读 · 2022年11月9日

SG-Shuffle: Multi-aspect Shuffle Transformer for Scene Graph Generation

Arxiv

0+阅读 · 2022年11月9日

Semantic Metadata Extraction from Dense Video Captioning

Arxiv

0+阅读 · 2022年11月5日

Textual Manifold-based Defense Against Natural Language Adversarial Examples

Arxiv

0+阅读 · 2022年11月5日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

VIP会员

文章信息

相关主题

词元分析器

相关VIP内容

【硬核书】树与网络上的概率，716页pdf

【硬核书】树与网络上的概率，716页pdf

专知会员服务

77+阅读 · 2021年12月8日

【文本生成现代方法】Modern Methods for Text Generation

【文本生成现代方法】Modern Methods for Text Generation

专知会员服务

44+阅读 · 2020年9月11日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

【新书】数字图像(影像)处理手第二版，2176pdf，Mathematical Methods in Imaging

专知会员服务

93+阅读 · 2020年2月12日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

《多体环境下定位导航授时（PNT）系统研究》228页

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

相关资讯

AIART 2022 Call for Papers

AIART 2022 Call for Papers

CCF多媒体专委会

1+阅读 · 2022年2月13日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

【ICIG2021】Latest News & Announcements of the Industry Talk1

【ICIG2021】Latest News & Announcements of the Industry Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年7月28日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

ResNet, AlexNet, VGG, Inception：各种卷积网络架构的理解

全球人工智能

20+阅读 · 2017年12月17日

相关论文

Class-attention Video Transformer for Engagement Intensity Prediction

Arxiv

0+阅读 · 2022年11月10日

Video Vision Transformers for Violence Detection

Arxiv

0+阅读 · 2022年11月10日

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

Arxiv

0+阅读 · 2022年11月9日

Bit-depth enhancement detection for compressed video

Bit-depth enhancement detection for compressed video

Arxiv

0+阅读 · 2022年11月9日

SG-Shuffle: Multi-aspect Shuffle Transformer for Scene Graph Generation

Arxiv

0+阅读 · 2022年11月9日

Semantic Metadata Extraction from Dense Video Captioning

Arxiv

0+阅读 · 2022年11月5日

Textual Manifold-based Defense Against Natural Language Adversarial Examples

Arxiv

0+阅读 · 2022年11月5日

Generative Models as a Data Source for Multiview Representation Learning

Arxiv

16+阅读 · 2021年6月9日

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

UP-DETR: Unsupervised Pre-training for Object Detection with Transformers

Arxiv

19+阅读 · 2020年11月18日

Text Generation from Knowledge Graphs with Graph Transformers

Arxiv

35+阅读 · 2019年4月4日

相关基金

基于低秩与稀疏矩阵分离的视频合成孔径雷达 GMTI信号处理技术研究

国家自然科学基金

2+阅读 · 2015年12月31日

基于腔内泵浦技术的蓝光泵浦Pr:KYF预调Q倍频紫外单频激光器机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

跨语图像检索中融合视觉信息的多语翻译与集成方法研究

国家自然科学基金

2+阅读 · 2013年12月31日

TiO2光催化机理的多时间尺度研究

国家自然科学基金

0+阅读 · 2012年12月31日

融合深层语义理解的网络水军发帖自动检测技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

知识驱动的软件需求和体系结构文档的归档方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

高脂及肠道微生态代谢异常影响大肠癌发病风险的机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Fuzzy Domain 理论及其新拓扑工具研究

国家自然科学基金

0+阅读 · 2010年12月31日

约化群酉表示的branching law及其应用

国家自然科学基金

0+阅读 · 2009年12月31日

利用GPS与IM/WS干涉测量监测鲜水河断层变形

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员