STDepth Former: 用自监督的变压器模型从视频中预测时空空间深度</s> (STDepthFormer: Predicting Spatio-temporal Depth from Video with a Self-supervised Transformer Model)

In this paper, a self-supervised model that simultaneously predicts a sequence of future frames from video-input with a novel spatial-temporal attention (ST) network is proposed. The ST transformer network allows constraining both temporal consistency across future frames whilst constraining consistency across spatial objects in the image at different scales. This was not the case in prior works for depth prediction, which focused on predicting a single frame as output. The proposed model leverages prior scene knowledge such as object shape and texture similar to single-image depth inference methods, whilst also constraining the motion and geometry from a sequence of input images. Apart from the transformer architecture, one of the main contributions with respect to prior works lies in the objective function that enforces spatio-temporal consistency across a sequence of output frames rather than a single output frame. As will be shown, this results in more accurate and robust depth sequence forecasting. The model achieves highly accurate depth forecasting results that outperform existing baselines on the KITTI benchmark. Extensive ablation studies were performed to assess the effectiveness of the proposed techniques. One remarkable result of the proposed model is that it is implicitly capable of forecasting the motion of objects in the scene, rather than requiring complex models involving multi-object detection, segmentation and tracking.

翻译：在本文中,提出了一种自我监督的模式,该模式同时预测从视频输入中未来框架的顺序,同时使用一个新的时空注意(ST)网络来预测一个全新的空间-时空注意(ST)网络。ST变压器网络可以限制未来框架之间的时间一致性,同时限制不同比例图像中空间物体的一致性。这在先前的深度预测工作中并不属于这种情况,其重点是预测单一框架作为输出。拟议的模型利用了与单一图像深度推断方法相似的物体形状和纹理等先前的场景知识,同时制约了输入图像序列的运动和几何学。除了变压器结构外,对先前工程的主要贡献之一在于目标功能,即执行一个输出框架序列而不是单一输出框架之间的空间-时空一致性。正如下文所示,这将导致更准确和稳健的深度序列预测。模型取得了高度准确的深度预测结果,超过了KITTI基准的现有基线。除了对一系列输入图像的动作和几何测距外,还进行了广泛的分析研究,以评估拟议技术的有效性。除了变压器结构之外,对先前工程的主要贡献之一是在目标中执行一个显著的结果,即要求进行多轨的模型,而隐性地进行测测测算。</s>

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日