MOSO: 分解运动、场景和视频预报对象</s> (MOSO: Decomposing MOtion, Scene and Object for Video Prediction) - 专知论文

会员服务 ·

0

分解 · 词元分析器 · state-of-the-art · Performer · 离散化 ·

2023 年 3 月 7 日

MOSO: Decomposing MOtion, Scene and Object for Video Prediction

翻译：MOSO: 分解运动、场景和视频预报对象

Mingzhen Sun,Weining Wang,Xinxin Zhu,Jing Liu

from arxiv, Accepted by CVPR 2023

Motion, scene and object are three primary visual components of a video. In particular, objects represent the foreground, scenes represent the background, and motion traces their dynamics. Based on this insight, we propose a two-stage MOtion, Scene and Object decomposition framework (MOSO) for video prediction, consisting of MOSO-VQVAE and MOSO-Transformer. In the first stage, MOSO-VQVAE decomposes a previous video clip into the motion, scene and object components, and represents them as distinct groups of discrete tokens. Then, in the second stage, MOSO-Transformer predicts the object and scene tokens of the subsequent video clip based on the previous tokens and adds dynamic motion at the token level to the generated object and scene tokens. Our framework can be easily extended to unconditional video generation and video frame interpolation tasks. Experimental results demonstrate that our method achieves new state-of-the-art performance on five challenging benchmarks for video prediction and unconditional video generation: BAIR, RoboNet, KTH, KITTI and UCF101. In addition, MOSO can produce realistic videos by combining objects and scenes from different videos.

翻译：视频的三个主要视觉组成部分是视频的感官、场景和对象。特别是, 对象代表前景, 场景代表背景, 运动跟踪动态。基于这一洞察, 我们提出一个由 MOSO- VQVAE 和 MOSO- Transformination 组成的两阶段运动、场景和对象分解框架( MOSO) 用于视频预测。在第一阶段, MOSO- VQVAE 将上一个视频剪辑分解到运动、场景和对象组件中, 并把它们作为不同组的离散符号。然后, 在第二阶段, MOSO- Transfrench 预测以前几个符号为基础的后续视频剪片片片段的对象和场景符号, 并在代号上增加生成的物体和场景符号的动态动作。我们的框架可以很容易扩展为无条件的视频生成和视频框内插图。实验结果显示, 我们的方法在视频预测和无条件生成的视频的5个具有挑战性基准上实现了新的状态性表现: ABIR、 RoboNet、 KTH、 KTHTH、 KITTITIT和UC101 。此外片段可以制作不同图像。</s>

0

相关内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

石墨烯@类水滑石对水中抗生素/Cr(VI)复合污染物的协同去除机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-5591靶向AGER/ROS/JNK抑制MSCs氧化应激损伤在糖尿病创面修复中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于多层次结构特征的新型混杂纤维增强水泥基复合材料(HyFRCC)的性能及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Aurivillius-Sillenite结构光催化材料的性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

新型多孔复合材料

国家自然科学基金

0+阅读 · 2013年12月31日

GFI1基因的过表达在皮肤T细胞淋巴瘤疾病进展中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

基于脆弱性的大气颗粒物重金属健康风险研究

国家自然科学基金

0+阅读 · 2013年12月31日

"DNA-特异性配体"杂合手性催化剂的设计及催化性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

纤维增强复合材料损伤识别与损伤演化研究

国家自然科学基金

0+阅读 · 2011年12月31日

多通路候选基因单体型与唇腭裂风险的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

Knowledge Enhanced Model for Live Video Comment Generation

Arxiv

0+阅读 · 2023年4月28日

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

Arxiv

0+阅读 · 2023年4月27日

From Chaos Comes Order: Ordering Event Representations for Object Detection

Arxiv

0+阅读 · 2023年4月26日

Scene Graph Lossless Compression with Adaptive Prediction for Objects and Relations

Arxiv

0+阅读 · 2023年4月26日

Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting

Arxiv

0+阅读 · 2023年4月26日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Arxiv

11+阅读 · 2018年12月8日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

VIP会员

文章信息

相关主题

词元分析器

state-of-the-art

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

【CVPR2022】视频对比学习的概率表示，Probabilistic Representations for Video Contrastive Learning

专知会员服务

16+阅读 · 2022年4月11日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

对比学习简述

专知会员服务

90+阅读 · 2021年6月29日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

Knowledge Enhanced Model for Live Video Comment Generation

Arxiv

0+阅读 · 2023年4月28日

SparseFusion: Fusing Multi-Modal Sparse Representations for Multi-Sensor 3D Object Detection

Arxiv

0+阅读 · 2023年4月27日

From Chaos Comes Order: Ordering Event Representations for Object Detection

Arxiv

0+阅读 · 2023年4月26日

Scene Graph Lossless Compression with Adaptive Prediction for Objects and Relations

Arxiv

0+阅读 · 2023年4月26日

Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting

Arxiv

0+阅读 · 2023年4月26日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

GAN Dissection: Visualizing and Understanding Generative Adversarial Networks

Arxiv

11+阅读 · 2018年12月8日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

DOTA: A Large-scale Dataset for Object Detection in Aerial Images

Arxiv

19+阅读 · 2018年1月27日

相关基金

石墨烯@类水滑石对水中抗生素/Cr(VI)复合污染物的协同去除机制研究

国家自然科学基金

0+阅读 · 2015年12月31日

miR-5591靶向AGER/ROS/JNK抑制MSCs氧化应激损伤在糖尿病创面修复中的作用及机制

国家自然科学基金

0+阅读 · 2015年12月31日

基于多层次结构特征的新型混杂纤维增强水泥基复合材料(HyFRCC)的性能及机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于Aurivillius-Sillenite结构光催化材料的性能调控研究

国家自然科学基金

0+阅读 · 2014年12月31日

新型多孔复合材料

国家自然科学基金

0+阅读 · 2013年12月31日

GFI1基因的过表达在皮肤T细胞淋巴瘤疾病进展中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

基于脆弱性的大气颗粒物重金属健康风险研究

国家自然科学基金

0+阅读 · 2013年12月31日

"DNA-特异性配体"杂合手性催化剂的设计及催化性能研究

国家自然科学基金

0+阅读 · 2012年12月31日

纤维增强复合材料损伤识别与损伤演化研究

国家自然科学基金

0+阅读 · 2011年12月31日

多通路候选基因单体型与唇腭裂风险的相关性研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员