Motion prediction is highly relevant to the perception of dynamic objects and static map elements in the scenarios of autonomous driving. In this work, we propose PIP, the first end-to-end Transformer-based framework which jointly and interactively performs online mapping, object detection and motion prediction. PIP leverages map queries, agent queries and mode queries to encode the instance-wise information of map elements, agents and motion intentions, respectively. Based on the unified query representation, a differentiable multi-task interaction scheme is proposed to exploit the correlation between perception and prediction. Even without human-annotated HD map or agent's historical tracking trajectory as guidance information, PIP realizes end-to-end multi-agent motion prediction and achieves better performance than tracking-based and HD-map-based methods. PIP provides comprehensive high-level information of the driving scene (vectorized static map and dynamic objects with motion information), and contributes to the downstream planning and control. Code and models will be released for facilitating further research.
翻译:在这项工作中,我们提议PIP是第一个以端到端变异器为基础的框架,共同和互动地进行在线绘图、物体探测和运动预测;PIP利用地图查询、物剂查询和模式查询,分别对地图要素、物剂和运动意图的实例信息进行编码;根据统一查询说明,提议一个不同的多任务互动计划,以利用感知和预测之间的关联;即使没有人文附加说明的HD地图或代理人的历史跟踪轨迹作为指导信息,PIP实现端到端多剂运动预测,并取得比基于跟踪和HDM地图的方法更好的业绩;PIP提供驾驶场的全面高层次信息(带有运动信息的静态地图和动态物体),并协助下游规划和控制;将公布代码和模型,以促进进一步的研究。