Currently, detecting 3D objects in Bird's-Eye-View (BEV) is superior to other 3D detectors for autonomous driving and robotics. However, transforming image features into BEV necessitates special operators to conduct feature sampling. These operators are not supported on many edge devices, bringing extra obstacles when deploying detectors. To address this problem, we revisit the generation of BEV representation and propose detecting objects in perspective BEV -- a new BEV representation that does not require feature sampling. We demonstrate that perspective BEV features can likewise enjoy the benefits of the BEV paradigm. Moreover, the perspective BEV improves detection performance by addressing issues caused by feature sampling. We propose PersDet for high-performance object detection in perspective BEV space based on this discovery. While implementing a simple and memory-efficient structure, PersDet outperforms existing state-of-the-art monocular methods on the nuScenes benchmark, reaching 34.6% mAP and 40.8% NDS when using ResNet-50 as the backbone.
翻译:目前,在Bird's-Eye-View(BEV)中探测三维天体比其他自动驾驶和机器人的三维探测器优越。然而,将图像特征转换成三维天体需要特殊操作员进行特征取样。这些操作员在许多边缘设备上得不到支持,因此在部署探测器时会遇到额外的障碍。为了解决这个问题,我们重新研究BEV代表的生成,并提议在视野BEV中探测天体 -- -- 一种不需要特征取样的新的BEV代表。我们证明,观点BEV特征同样可以享受BEV模式的好处。此外,通过处理特征取样问题,BEV观点提高了探测性能。我们建议PersDet在利用ResNet-50作为主干线时,在视野BEV空间中进行高性能天体探测。在采用简单且记忆高效的结构的同时, PersDet在NSeenes基准上超越了现有最先进的单项方法,达到34.6% mAP和40.8% NDS。