It is natural to construct a multi-frame instead of a single-frame 3D detector for a continuous-time stream. Although increasing the number of frames might improve performance, previous multi-frame studies only used very limited frames to build their systems due to the dramatically increased computational and memory cost. To address these issues, we propose a novel on-stream training and prediction framework that, in theory, can employ an infinite number of frames while keeping the same amount of computation as a single-frame detector. This infinite framework (INT), which can be used with most existing detectors, is utilized, for example, on the popular CenterPoint, with significant latency reductions and performance improvements. We've also conducted extensive experiments on two large-scale datasets, nuScenes and Waymo Open Dataset, to demonstrate the scheme's effectiveness and efficiency. By employing INT on CenterPoint, we can get around 7% (Waymo) and 15% (nuScenes) performance boost with only 2~4ms latency overhead, and currently SOTA on the Waymo 3D Detection leaderboard.
翻译:虽然增加框架的数量可能会提高性能,但以往的多框架研究只使用非常有限的框架来建立它们的系统,因为计算和记忆成本急剧增加。为了解决这些问题,我们提议了一个全新的流上培训和预测框架,从理论上讲,它可以使用无限数量的框架,同时保持与单一框架探测器相同的计算量。这个无限的框架(INT)可用于大多数现有探测器,例如,在流行的中点上使用,并显著缩短了延缓度和性能改进。我们还对两个大型数据集,即Nuscenes和Waymo OpenD数据集进行了广泛的实验,以证明该计划的有效性和效率。通过在中心点上使用INT,我们可以得到大约7%(Waymo)和15%(nuSenes)的性能提升,只有2~4m的悬浮器,目前SOTA在Waymo 3D探测头板上。