Multi-agent trajectory generation is a core problem for autonomous driving and intelligent transportation systems. However, efficiently modeling the dynamic interactions between numerous road users and infrastructures in complex scenes remains an open problem. Existing methods typically employ distance-based or fully connected dense graph structures to capture interaction information, which not only introduces a large number of redundant edges but also requires complex and heavily parameterized networks for encoding, thereby resulting in low training and inference efficiency, limiting scalability to large and complex traffic scenes. To overcome the limitations of existing methods, we propose SparScene, a sparse graph learning framework designed for efficient and scalable traffic scene representation. Instead of relying on distance thresholds, SparScene leverages the lane graph topology to construct structure-aware sparse connections between agents and lanes, enabling efficient yet informative scene graph representation. SparScene adopts a lightweight graph encoder that efficiently aggregates agent-map and agent-agent interactions, yielding compact scene representations with substantially improved efficiency and scalability. On the motion prediction benchmark of the Waymo Open Motion Dataset (WOMD), SparScene achieves competitive performance with remarkable efficiency. It generates trajectories for more than 200 agents in a scene within 5 ms and scales to more than 5,000 agents and 17,000 lanes with merely 54 ms of inference time with a GPU memory of 2.9 GB, highlighting its superior scalability for large-scale traffic scenes.
翻译:多智能体轨迹生成是自动驾驶与智能交通系统的核心问题。然而,如何高效建模复杂场景中众多道路使用者与基础设施之间的动态交互仍是一个开放性问题。现有方法通常采用基于距离或全连接的稠密图结构来捕捉交互信息,这不仅引入了大量冗余边,还需要复杂且参数量庞大的编码网络,导致训练与推理效率低下,限制了其在大规模复杂交通场景中的可扩展性。为克服现有方法的局限性,我们提出SparScene——一种专为高效可扩展交通场景表示设计的稀疏图学习框架。SparScene摒弃距离阈值依赖,利用车道图拓扑结构构建智能体与车道间的结构感知稀疏连接,从而实现高效且信息丰富的场景图表示。该框架采用轻量级图编码器,有效聚合智能体-地图与智能体-智能体间的交互,生成紧凑的场景表示,在效率与可扩展性上均获得显著提升。在Waymo开放运动数据集(WOMD)的运动预测基准测试中,SparScene以卓越的效率取得了具有竞争力的性能。该框架可在5毫秒内为场景中超过200个智能体生成轨迹,并能够扩展至超过5000个智能体与17000条车道的场景,仅需54毫秒推理时间及2.9GB GPU内存,充分彰显其在大规模交通场景中的卓越可扩展性。