In this paper, we propose THOMAS, a joint multi-agent trajectory prediction framework allowing for efficient and consistent prediction of multi-agent multi-modal trajectories. We present a unified model architecture for fast and simultaneous agent future heatmap estimation leveraging hierarchical and sparse image generation. We demonstrate that heatmap output enables a higher level of control on the predicted trajectories compared to vanilla multi-modal trajectory regression, allowing to incorporate additional constraints for tighter sampling or collision-free predictions in a deterministic way. However, we also highlight that generating scene-consistent predictions goes beyond the mere generation of collision-free trajectories. We therefore propose a learnable trajectory recombination model that takes as input a set of predicted trajectories for each agent and outputs its consistent reordered recombination. We report our results on the Interaction multi-agent prediction challenge and rank $1^{st}$ on the online test leaderboard.
翻译:在本文中,我们提议THOMAS,这是一个联合多试剂轨迹预测框架,允许有效和一致地预测多试剂多式轨迹。我们提出了一个统一模型,用于利用等级和稀疏的图像生成的快速和同步剂未来热映射估计。我们证明,热映射输出能够比香草多式轨迹回归对预测轨迹进行更高程度的控制,从而能够以决定性的方式纳入对更严格取样或无碰撞预测的额外限制。然而,我们还强调指出,产生符合场景的预测超出了仅仅产生无碰撞轨迹的范围。因此,我们提出了一个可学习的轨迹重组模型,该模型以每种物剂的一套预测轨迹为投入,并产生其一致的再组合结果。我们报告了我们关于互动多剂预测挑战的结果,并在在线测试头板上排名为1美元。