Slotformer: 用对象中心模型模拟无监督的视觉动态 (SlotFormer: Unsupervised Visual Dynamics Simulation with Object-Centric Models)

Understanding dynamics from visual observations is a challenging problem that requires disentangling individual objects from the scene and learning their interactions. While recent object-centric models can successfully decompose a scene into objects, modeling their dynamics effectively still remains a challenge. We address this problem by introducing SlotFormer -- a Transformer-based autoregressive model operating on learned object-centric representations. Given a video clip, our approach reasons over object features to model spatio-temporal relationships and predicts accurate future object states. In this paper, we successfully apply SlotFormer to perform video prediction on datasets with complex object interactions. Moreover, the unsupervised SlotFormer's dynamics model can be used to improve the performance on supervised downstream tasks, such as Visual Question Answering (VQA), and goal-conditioned planning. Compared to past works on dynamics modeling, our method achieves significantly better long-term synthesis of object dynamics, while retaining high quality visual generation. Besides, SlotFormer enables VQA models to reason about the future without object-level labels, even outperforming counterparts that use ground-truth annotations. Finally, we show its ability to serve as a world model for model-based planning, which is competitive with methods designed specifically for such tasks.

翻译：从视觉观测中了解动态是一个具有挑战性的问题,需要将单个物体从现场分离出来,并学习它们的相互作用。虽然最近的以物体为中心的模型可以成功地将场景分解成物体,但有效地模拟其动态仍然是一个挑战。我们通过引入SlotFormer -- -- 一种基于变异器的自动递增模型,在有知识的以物体为中心的表达方式上运行。根据视频片段,我们对于物体特性的定位理由,以模拟时空关系,并预测准确的未来物体状态。在本文件中,我们成功地应用SlotFormer对与复杂物体相互作用的数据集进行视频预测。此外,未受监督的SlotFormer的动态模型可以用来改进监督下游任务的业绩,例如视觉问答(VQA)和有目标的调整规划。与以往关于动态模型的工作相比,我们的方法在保持高品质的视觉生成的同时,实现了对物体动态动态动态的长得多的合成。此外,SlotFormer使VQA模型能够解释未来没有物体等级标签的视频。此外,即使没有受监督的SlotFormerFormer Former的动态模型,也能够具体地设计出一种具有竞争力的模型作为世界性的工作。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

50+阅读 · 2022年10月2日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

33页PPT【AI+天气预测】，AI and Machine learning for weather predictions

专知会员服务

34+阅读 · 2022年3月5日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日