Latent-Shift: 通过时间移动的潜在扩散，实现高效的文本-视频生成 (Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation) - 专知论文

会员服务 ·

0

潜在 · 视频生成 · U-Net · 视频 · 图像生成 ·

2023 年 4 月 18 日

Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation

翻译：Latent-Shift: 通过时间移动的潜在扩散，实现高效的文本-视频生成

Jie An,Songyang Zhang,Harry Yang,Sonal Gupta,Jia-Bin Huang,Jiebo Luo,Xi Yin

from arxiv, https://latent-shift.github.io

We propose Latent-Shift -- an efficient text-to-video generation method based on a pretrained text-to-image generation model that consists of an autoencoder and a U-Net diffusion model. Learning a video diffusion model in the latent space is much more efficient than in the pixel space. The latter is often limited to first generating a low-resolution video followed by a sequence of frame interpolation and super-resolution models, which makes the entire pipeline very complex and computationally expensive. To extend a U-Net from image generation to video generation, prior work proposes to add additional modules like 1D temporal convolution and/or temporal attention layers. In contrast, we propose a parameter-free temporal shift module that can leverage the spatial U-Net as is for video generation. We achieve this by shifting two portions of the feature map channels forward and backward along the temporal dimension. The shifted features of the current frame thus receive the features from the previous and the subsequent frames, enabling motion learning without additional parameters. We show that Latent-Shift achieves comparable or better results while being significantly more efficient. Moreover, Latent-Shift can generate images despite being finetuned for T2V generation.

翻译：我们提出了 Latent-Shift 方法--一种基于预训练的文本-图像生成模型的高效文本-视频生成方法，该模型包括自编码器和 U-Net 扩散模型。在潜在空间中学习视频扩散模型比在像素空间中学习更加高效。后者通常限于首先生成低分辨率视频，然后是一系列的帧插值和超分辨率模型，这使得整个管道非常复杂和计算量大。为了将 U-Net 从图像生成扩展到视频生成，先前的工作提出了添加额外模块(如一维时间卷积和/或时间注意力层)的方法。相比之下，我们提出了一个不需要额外参数的时间移位模块，用于将空间 U-Net 直接用于视频生成。我们通过将特征图通道的两部分沿时间维度向前和向后移位来实现这一点。当前帧的移位特征因此接收来自前一帧和后一帧的特征，实现了运动学习而无需额外的参数。我们展示了 Latent-Shift 实现了相当或更好的结果，同时更加高效。此外，即使是针对 T2V 生成进行了微调，Latent-Shift 也可以生成图像。

0

相关内容

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

专知会员服务

16+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【ICCV 2021】HCFlow：使用一个统一的框架处理图像超分辨率和图像再缩放

专知会员服务

15+阅读 · 2021年10月4日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

NeurlPS 2022 | 用于医学图像分割的类感知生成对抗Transformer

NeurlPS 2022 | 用于医学图像分割的类感知生成对抗Transformer

PaperWeekly

1+阅读 · 2022年10月24日

从多篇论文看扩散模型在文本生成领域的应用

从多篇论文看扩散模型在文本生成领域的应用

PaperWeekly

0+阅读 · 2022年10月20日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

历史最全GAN网络及其各种变体整理（附论文及代码实现）

历史最全GAN网络及其各种变体整理（附论文及代码实现）

深度学习与NLP

16+阅读 · 2018年2月26日

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

机器学习研究会

12+阅读 · 2018年1月27日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

随机延迟微分方程数值解的延迟依赖稳定性及自适应技术

国家自然科学基金

0+阅读 · 2014年12月31日

语音及情感语义同步的三维人脸可视化：从发声器官到外观

国家自然科学基金

3+阅读 · 2014年12月31日

基于新疆民族服饰基元的数据库研究

国家自然科学基金

1+阅读 · 2012年12月31日

未来广播电视频段频谱高效管理策略和复用方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

等离子体强化多孔介质燃烧降解有机废气的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超图形XGML的图像半结构化研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种适用于高维问题的Co-kriging代理模型新方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

基于几何约束lifting技术的细分小波变换研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于超分辨率技术的视频重构与编码研究

国家自然科学基金

1+阅读 · 2008年12月31日

Video Colorization with Pre-trained Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

Arxiv

0+阅读 · 2023年6月1日

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Arxiv

0+阅读 · 2023年6月1日

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

Arxiv

0+阅读 · 2023年6月1日

Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling

Arxiv

0+阅读 · 2023年5月31日

GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations

Arxiv

0+阅读 · 2023年5月31日

Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

Arxiv

0+阅读 · 2023年5月31日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

Arxiv

11+阅读 · 2020年12月15日

Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss

Arxiv

10+阅读 · 2018年4月29日

VIP会员

文章信息

相关主题

相关VIP内容

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

专知会员服务

16+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【ICCV 2021】HCFlow：使用一个统一的框架处理图像超分辨率和图像再缩放

专知会员服务

15+阅读 · 2021年10月4日

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

神经网络序列数据建模，229页ppt，Modeling Sequential Data with Neural Nets

专知会员服务

67+阅读 · 2020年7月25日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日

【ACL2020】对抗性文本生成，Improving Adversarial Text Generation

专知会员服务

52+阅读 · 2020年5月5日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

Google AI博客解读论文《Reformer: The Efficient Transformer》，百万量级注意力机制

专知会员服务

70+阅读 · 2020年1月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】以人为中心的强化学习

任务规划与地形分析：现代复杂环境作战导航体系

认知优势：人工智能在国家安全决策中的核心作用

大模型赋能的具身智能：决策与具身学习综述

相关资讯

NeurlPS 2022 | 用于医学图像分割的类感知生成对抗Transformer

NeurlPS 2022 | 用于医学图像分割的类感知生成对抗Transformer

PaperWeekly

1+阅读 · 2022年10月24日

从多篇论文看扩散模型在文本生成领域的应用

从多篇论文看扩散模型在文本生成领域的应用

PaperWeekly

0+阅读 · 2022年10月20日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

【论文推荐】最新八篇图像描述生成相关论文—比较级对抗学习、正则化RNNs、深层网络、视觉对话、婴儿说话、自我检索

专知

10+阅读 · 2018年4月12日

历史最全GAN网络及其各种变体整理（附论文及代码实现）

历史最全GAN网络及其各种变体整理（附论文及代码实现）

深度学习与NLP

16+阅读 · 2018年2月26日

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

【推荐】NiftyNet：面向医学图像分析和图像引导治疗的开源CNN平台（附代码）

机器学习研究会

12+阅读 · 2018年1月27日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Video Colorization with Pre-trained Text-to-Image Diffusion Models

Arxiv

0+阅读 · 2023年6月2日

SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds

Arxiv

0+阅读 · 2023年6月1日

ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation

Arxiv

0+阅读 · 2023年6月1日

Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation

Arxiv

0+阅读 · 2023年6月1日

Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling

Arxiv

0+阅读 · 2023年5月31日

GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations

Arxiv

0+阅读 · 2023年5月31日

Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

Arxiv

0+阅读 · 2023年5月31日

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Arxiv

13+阅读 · 2021年1月5日

Learning from History: Modeling Temporal Knowledge Graphs with Sequential Copy-Generation Networks

Arxiv

11+阅读 · 2020年12月15日

Unsupervised Cross-Modality Domain Adaptation of ConvNets for Biomedical Image Segmentations with Adversarial Loss

Arxiv

10+阅读 · 2018年4月29日

相关基金

随机延迟微分方程数值解的延迟依赖稳定性及自适应技术

国家自然科学基金

0+阅读 · 2014年12月31日

语音及情感语义同步的三维人脸可视化：从发声器官到外观

国家自然科学基金

3+阅读 · 2014年12月31日

基于新疆民族服饰基元的数据库研究

国家自然科学基金

1+阅读 · 2012年12月31日

未来广播电视频段频谱高效管理策略和复用方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

等离子体强化多孔介质燃烧降解有机废气的机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于超图形XGML的图像半结构化研究

国家自然科学基金

0+阅读 · 2012年12月31日

一种适用于高维问题的Co-kriging代理模型新方法研究

国家自然科学基金

0+阅读 · 2009年12月31日

压缩采样框架下的自适应稀疏信号感知与重建

国家自然科学基金

0+阅读 · 2009年12月31日

基于几何约束lifting技术的细分小波变换研究

国家自然科学基金

0+阅读 · 2009年12月31日

基于超分辨率技术的视频重构与编码研究

国家自然科学基金

1+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员