We propose a framework for predicting future trajectories of traffic agents in highly interactive environments. On the basis of the fact that autonomous driving vehicles are equipped with various types of sensors (e.g., LiDAR scanner, RGB camera, etc.), our work aims to get benefit from the use of multiple input modalities that are complementary to each other. The proposed approach is composed of two stages. (i) feature encoding where we discover motion behavior of the target agent with respect to other directly and indirectly observable influences. We extract such behaviors from multiple perspectives such as in top-down and frontal view. (ii) cross-modal embedding where we embed a set of learned behavior representations into a single cross-modal latent space. We construct a generative model and formulate the objective functions with an additional regularizer specifically designed for future prediction. An extensive evaluation is conducted to show the efficacy of the proposed framework using two benchmark driving datasets.
翻译:我们提出了一个在高度互动环境中预测交通代理商未来轨迹的框架。基于自主驾驶车辆配备了各种传感器(如LiDAR扫描仪、RGB照相机等)这一事实,我们的工作旨在从使用相互补充的多种输入模式中受益。拟议办法分为两个阶段。(一) 特征编码,我们发现目标代理商与其他直接和间接可见影响有关的动作行为。我们从自上而下和前视等多种角度提取这些行为。 (二) 跨模式嵌入,我们把一套学到的行为表现器嵌入一个单一的跨模式潜伏空间。我们建立一个基因化模型,用专门为未来预测设计的另外一种固定装置来制定客观功能。进行广泛的评价,利用两个基准驱动数据集来显示拟议框架的效力。