In this paper, we propose SparseDet for end-to-end 3D object detection from point cloud. Existing works on 3D object detection rely on dense object candidates over all locations in a 3D or 2D grid following the mainstream methods for object detection in 2D images. However, this dense paradigm requires expertise in data to fulfill the gap between label and detection. As a new detection paradigm, SparseDet maintains a fixed set of learnable proposals to represent latent candidates and directly perform classification and localization for 3D objects through stacked transformers. It demonstrates that effective 3D object detection can be achieved with none of post-processing such as redundant removal and non-maximum suppression. With a properly designed network, SparseDet achieves highly competitive detection accuracy while running with a more efficient speed of 34.5 FPS. We believe this end-to-end paradigm of SparseDet will inspire new thinking on the sparsity of 3D object detection.
翻译:在本文中,我们建议从点云进行端到端的 3D 对象探测。 3D 对象探测的现有工作依赖于3D 或 2D 网格中所有地点的密度对象候选人, 遵循2D 图像中物体探测的主流方法。 然而, 这种密集的模式要求数据方面的专门知识, 以填补标签与探测之间的差距。 作为新的探测模式, Sprass Det 维持一套固定的可学习的建议, 以代表潜在对象, 通过堆叠变压器直接对 3D 对象进行分类和本地化。 它表明, 有效的 3D 对象探测不可能在后处理中实现, 如多余的清除和非最大抑制等。 在一个设计得当的网络中, SprassDet 取得高度竞争性的探测准确性, 同时以34.5 FPS 的更有效速度运行。 我们相信, SprassD D 的端到端模式将激发对 3D 对象探测的宽度进行新的思考。