Safety is still the main issue of autonomous driving, and in order to be globally deployed, they need to predict pedestrians' motions sufficiently in advance. While there is a lot of research on coarse-grained (human center prediction) and fine-grained predictions (human body keypoints prediction), we focus on 3D bounding boxes, which are reasonable estimates of humans without modeling complex motion details for autonomous vehicles. This gives the flexibility to predict in longer horizons in real-world settings. We suggest this new problem and present a simple yet effective model for pedestrians' 3D bounding box prediction. This method follows an encoder-decoder architecture based on recurrent neural networks, and our experiments show its effectiveness in both the synthetic (JTA) and real-world (NuScenes) datasets. The learned representation has useful information to enhance the performance of other tasks, such as action anticipation. Our code is available online: https://github.com/vita-epfl/bounding-box-prediction
翻译:安全仍然是自主驾驶的主要问题,而且为了在全球部署,它们需要充分提前预测行人的行动。虽然对粗粗的(人类中心预测)和细细的预测(人体身体关键点预测)进行了大量研究,但我们侧重于3D捆绑箱,它们是人类的合理估计,而没有为自主车辆模拟复杂的运动细节。这为在现实世界环境中预测较长的视野提供了灵活性。我们提出了这个新问题,并为行人3D捆绑箱预测提供了一个简单而有效的模型。这种方法遵循基于经常性神经网络的编码解码器结构,而我们的实验显示其在合成(JTA)和真实世界(NuScenes)数据集的有效性。所学的表示方式有有用的信息,可以提高其他任务(如行动预测)的绩效。我们的代码可以在网上查阅:https://github.com/vita-epfl/bound-box-pretrection。