Our goal is to forecast the near future given a set of recent observations. We think this ability to forecast, i.e., to anticipate, is integral for the success of autonomous agents which need not only passively analyze an observation but also must react to it in real-time. Importantly, accurate forecasting hinges upon the chosen scene decomposition. We think that superior forecasting can be achieved by decomposing a dynamic scene into individual 'things' and background 'stuff'. Background 'stuff' largely moves because of camera motion, while foreground 'things' move because of both camera and individual object motion. Following this decomposition, we introduce panoptic segmentation forecasting. Panoptic segmentation forecasting opens up a middle-ground between existing extremes, which either forecast instance trajectories or predict the appearance of future image frames. To address this task we develop a two-component model: one component learns the dynamics of the background stuff by anticipating odometry, the other one anticipates the dynamics of detected things. We establish a leaderboard for this novel task, and validate a state-of-the-art model that outperforms available baselines.
翻译:我们的目标是预测近期的近期, 给出一系列最近的观测。 我们认为, 这种预测能力, 也就是预测, 是自主代理器成功所不可或缺的, 这些代理器不仅需要被动地分析观测, 而且必须实时地对观测作出反应。 重要的是, 准确的预测取决于所选的场景分解。 我们认为, 将动态场景分解到单个的“ 事物” 和背景的“ 背景” 内, 可以实现更高级的预测。 背景的“ 部分” 移动主要是因为相机运动, 而前方的“ 东西” 移动是因为相机和单个的物体运动。 在这种分解后, 我们引入了光学分解预测。 光学分解预测在现有的极端之间打开了一个中间的中间地带, 要么预测实例轨迹, 要么预测未来图像框架的外观。 为了应对这一任务, 我们开发了一个两个组成部分的模型: 其中一个组成部分通过预测摄像来学习背景材料的动态, 另一组成部分则预测被检测到的事物的动态。 我们为这个新的任务建立一个领导板, 并验证一个超越现有基线的状态模型 。