Real-time, accurate prediction of human steering behaviors has wide applications, from developing intelligent traffic systems to deploying autonomous driving systems in both real and simulated worlds. In this paper, we present ContextVAE, a context-aware approach for multi-modal vehicle trajectory prediction. Built upon the backbone architecture of a timewise variational autoencoder, ContextVAE employs a dual attention mechanism for observation encoding that accounts for the environmental context information and the dynamic agents' states in a unified way. By utilizing features extracted from semantic maps during agent state encoding, our approach takes into account both the social features exhibited by agents on the scene and the physical environment constraints to generate map-compliant and socially-aware trajectories. We perform extensive testing on the nuScenes prediction challenge, Lyft Level 5 dataset and Waymo Open Motion Dataset to show the effectiveness of our approach and its state-of-the-art performance. In all tested datasets, ContextVAE models are fast to train and provide high-quality multi-modal predictions in real-time.
翻译:实时、准确预测人类指导行为具有广泛的应用,从开发智能交通系统到在现实和模拟世界部署自主驱动系统。本文介绍“环境VAE”,这是多式车辆轨迹预测的一种符合环境需要的方法。“环境VAE”在具有时序变异自动编码器的主干结构上,使用一种双重关注的观察编码机制,以统一的方式记录环境背景信息和动态物剂的状态。通过利用在制剂状态编码期间从语义图中提取的特征,我们的方法考虑到了现场物剂展示的社会特征和物理环境制约因素,以产生符合地图和社会觉悟的轨迹。我们广泛测试了Nuscenes预测挑战、Lyft 5级数据集和Waymo Open Motion数据集,以显示我们的方法及其最新性表现的有效性。在所有测试的数据集中,“环境VAE”模型迅速培训和提供实时高质量多式预测。