In self-driving, predicting future in terms of location and motion of all the agents around the vehicle is a crucial requirement for planning. Recently, a new joint formulation of perception and prediction has emerged by fusing rich sensory information perceived from multiple cameras into a compact bird's-eye view representation to perform prediction. However, the quality of future predictions degrades over time while extending to longer time horizons due to multiple plausible predictions. In this work, we address this inherent uncertainty in future predictions with a stochastic temporal model. Our model learns temporal dynamics in a latent space through stochastic residual updates at each time step. By sampling from a learned distribution at each time step, we obtain more diverse future predictions that are also more accurate compared to previous work, especially stretching both spatially further regions in the scene and temporally over longer time horizons. Despite separate processing of each time step, our model is still efficient through decoupling of the learning of dynamics and the generation of future predictions.
翻译:在自我驾驶中,从车辆周围所有物剂的位置和运动方面预测未来是规划的关键要求。最近,通过将从多个摄像头中观察到的丰富的感官信息注入紧凑的鸟眼中进行预测,出现了一种新的认识和预测的联合配方。然而,未来预测的质量随着时间的流逝而下降,而由于多种可信的预测而延伸到更长的时间范围。在这项工作中,我们用一个随机时间模型来处理未来预测中这种内在的不确定性。我们的模型通过每个步骤的随机残存更新来了解潜藏空间的时间动态。通过从每个步骤的学习分布中取样,我们获得了与以往工作相比更为准确的更多样化的未来预测,特别是将现场空间更宽广的区域和时间跨过更长的时间范围拉开。尽管每个步骤都有不同的处理过程,但通过分解动态学习和生成未来预测,我们的模型仍然有效。