Safe and proactive planning in robotic systems generally requires accurate predictions of the environment. Prior work on environment prediction applied video frame prediction techniques to bird's-eye view environment representations, such as occupancy grids. ConvLSTM-based frameworks used previously often result in significant blurring and vanishing of moving objects, thus hindering their applicability for use in safety-critical applications. In this work, we propose two extensions to the ConvLSTM to address these issues. We present the Temporal Attention Augmented ConvLSTM (TAAConvLSTM) and Self-Attention Augmented ConvLSTM (SAAConvLSTM) frameworks for spatiotemporal occupancy prediction, and demonstrate improved performance over baseline architectures on the real-world KITTI and Waymo datasets.
翻译:以往的环境预测工作将视频框架预测技术应用于鸟类眼视环境显示,如占用网。以前使用的基于ConvLSTM的框架往往导致移动物体的显著模糊和消失,从而妨碍将其用于安全关键应用。在这项工作中,我们提议ConvLSTM的两个扩展,以解决这些问题。我们介绍了实时注意力增强ConvLSTM(TAA ConvLSTM)和空间时空占用预测自控增强ConvLSTM(SAA ConvLSTM)框架,并展示了实世KITTI和Waymo数据集基线结构的改进性能。