Accurate and reliable 3D detection is vital for many applications including autonomous driving vehicles and service robots. In this paper, we present a flexible and high-performance 3D detection framework, named MPPNet, for 3D temporal object detection with point cloud sequences. We propose a novel three-hierarchy framework with proxy points for multi-frame feature encoding and interactions to achieve better detection. The three hierarchies conduct per-frame feature encoding, short-clip feature fusion, and whole-sequence feature aggregation, respectively. To enable processing long-sequence point clouds with reasonable computational resources, intra-group feature mixing and inter-group feature attention are proposed to form the second and third feature encoding hierarchies, which are recurrently applied for aggregating multi-frame trajectory features. The proxy points not only act as consistent object representations for each frame, but also serve as the courier to facilitate feature interaction between frames. The experiments on largeWaymo Open dataset show that our approach outperforms state-of-the-art methods with large margins when applied to both short (e.g., 4-frame) and long (e.g., 16-frame) point cloud sequences. Specifically, MPPNet achieves 74.21%, 74.62% and 73.31% for vehicle, pedestrian and cyclist classes on the LEVEL 2 mAPH metric with 16-frame input.
翻译:精确可靠的三维检测对于许多应用都至关重要, 包括自主驾驶车辆和服务机器人。 在本文中, 我们提出了一个灵活和高性能的三维探测框架, 名为 MPPNet, 用于使用点云序列的三维时间天体探测。 我们提出一个新的三层结构框架, 配有多框架特征编码和互动的代用点, 以更好地检测。 三个等级组分别进行每个框架的特征编码、 短曲特征聚合和全序列特征集合。 为了能够用合理的计算资源处理长序列云, 提议对第二和第三个特征编码等级进行灵活和高性能的三维探测框架, 名为 MPPNet 。 代用点不仅是多框架特征编码和互动的代用点, 并充当促进各框架之间特征互动的送信手。 大瓦伊莫开放数据集的实验显示, 当我们的方法在短( e., 4- 22 ) 组内部特征混合和组间特征特性, 形成第二和第三个特征编码的编码等级, 用于总轨迹图中, 16 和MBLA 级, 等 级, 级, 级, 级, 级, 级, 和M31%