Predicting the trajectories of surrounding agents is still considered one of the most challenging tasks for autonomous driving. In this paper, we introduce a multi-modal trajectory prediction framework based on the transformer network. The semantic maps of each agent are used as inputs to convolutional networks to automatically derive relevant contextual information. A novel auxiliary loss that penalizes unfeasible off-road predictions is also proposed in this study. Experiments on the Lyft l5kit dataset show that the proposed model achieves state-of-the-art performance, substantially improving the accuracy and feasibility of the prediction outcomes.
翻译:暂无翻译