Anticipating the motion of neighboring vehicles is crucial for autonomous driving, especially on congested highways where even slight motion variations can result in catastrophic collisions. An accurate prediction of a future trajectory does not just rely on the previous trajectory, but also, more importantly, a simulation of the complex interactions between other vehicles nearby. Most state-of-the-art networks built to tackle the problem assume readily available past trajectory points, hence lacking a full end-to-end pipeline with direct video-to-output mechanism. In this article, we thus propose a novel end-to-end architecture that takes raw video inputs and outputs future trajectory predictions. It first extracts and tracks the 3D location of the nearby vehicles via multi-head attention-based regression networks as well as non-linear optimization. This provides the past trajectory points which then feeds into the trajectory prediction algorithm consisting of an attention-based LSTM encoder-decoder architecture, which allows it to model the complicated interdependence between the vehicles and make an accurate prediction of the future trajectory points of the surrounding vehicles. The proposed model is evaluated on the large-scale BLVD dataset, and has also been implemented on CARLA. The experimental results demonstrate that our approach outperforms various state-of-the-art models.
翻译:预测邻近车辆的运动对于自动驾驶至关重要,特别是在拥挤的高速公路上,即使微小的运动变化也可能导致灾难性的碰撞。准确预测未来轨迹不仅依赖之前的轨迹,还更重要的是预测邻近车辆之间的复杂交互作用。大多数最先进的网络解决方案都假设已经获得过去的轨迹点,因此缺乏直接将视频输入转换为输出的完整端到端流水线。因此,我们提出了一种新的端到端架构,它接受原始视频输入并输出未来预测轨迹。它首先通过多头注意力回归网络和非线性优化提取并跟踪邻近车辆的三维位置,提供了过去的轨迹点,然后馈入由基于注意力的LSTM编码器-解码器架构组成的轨迹预测算法中,它可以模拟车辆之间的复杂相互依赖关系,准确预测周围车辆未来的轨迹点。该模型在大规模BLVD数据集上进行了评估,并在CARLA上实现。实验结果表明,我们的方法优于各种最先进的模型。