The task of motion forecasting is critical for self-driving vehicles (SDVs) to be able to plan a safe maneuver. Towards this goal, modern approaches reason about the map, the agents' past trajectories and their interactions in order to produce accurate forecasts. The predominant approach has been to encode the map and other agents in the reference frame of each target agent. However, this approach is computationally expensive for multi-agent prediction as inference needs to be run for each agent. To tackle the scaling challenge, the solution thus far has been to encode all agents and the map in a shared coordinate frame (e.g., the SDV frame). However, this is sample inefficient and vulnerable to domain shift (e.g., when the SDV visits uncommon states). In contrast, in this paper, we propose an efficient shared encoding for all agents and the map without sacrificing accuracy or generalization. Towards this goal, we leverage pair-wise relative positional encodings to represent geometric relationships between the agents and the map elements in a heterogeneous spatial graph. This parameterization allows us to be invariant to scene viewpoint, and save online computation by re-using map embeddings computed offline. Our decoder is also viewpoint agnostic, predicting agent goals on the lane graph to enable diverse and context-aware multimodal prediction. We demonstrate the effectiveness of our approach on the urban Argoverse 2 benchmark as well as a novel highway dataset.
翻译:运动预测任务对于自行驾驶车辆(SDVs)规划安全操作至关重要。 实现这一目标, 现代方法对地图、 代理人以往的轨迹及其相互作用都有认识, 以便得出准确的预测。 最主要的方法是将地图和其他代理人编码在每个目标代理人的参考框架内。 但是, 这种方法对于多试剂预测来说计算成本很高, 因为要为每个代理人运行推论。 要应对规模化的挑战, 迄今的解决办法一直是将所有代理人和地图编码在一个共同的协调框架内( 如SDV框架) 。 但是, 抽样效率低, 容易发生域变( 例如, 当SDV访问不常见的状态时 ) 。 相比之下, 我们在本文件中建议对所有代理人和地图进行高效的共同编码, 但不牺牲准确性或概括性。 为了实现这一目标, 我们利用对称相对位置的相对定位编码来代表代理人与不同空间图中的地图元素之间的几何关系。 这一参数化使得我们能够从背景角度观察情况, 并且通过将我们的新版的路径预测目标进行在线计算, 将我们的新版的路径的路径定位定位定位转换成一个地图。