Recently, directly detecting 3D objects from 3D point clouds has received increasing attention. To extract object representation from an irregular point cloud, existing methods usually take a point grouping step to assign the points to an object candidate so that a PointNet-like network could be used to derive object features from the grouped points. However, the inaccurate point assignments caused by the hand-crafted grouping scheme decrease the performance of 3D object detection. In this paper, we present a simple yet effective method for directly detecting 3D objects from the 3D point cloud. Instead of grouping local points to each object candidate, our method computes the feature of an object from all the points in the point cloud with the help of an attention mechanism in the Transformers \cite{vaswani2017attention}, where the contribution of each point is automatically learned in the network training. With an improved attention stacking scheme, our method fuses object features in different stages and generates more accurate object detection results. With few bells and whistles, the proposed method achieves state-of-the-art 3D object detection performance on two widely used benchmarks, ScanNet V2 and SUN RGB-D. The code and models are publicly available at \url{https://github.com/zeliu98/Group-Free-3D}
翻译:最近,从 3D 点云中直接探测 3D 对象的工作得到越来越多的关注。 为了从非常规点云中获取对象代表, 现有方法通常采取点分组步骤, 向对象候选人指定点数, 以便使用类似 PointNet 的网络从分组点中获取目标特性。 然而, 手工制作的组合计划导致的不准确点分配减少了 3D 对象探测的性能。 在本文中, 我们提出了一个从 3D 点云中直接探测 3D 对象的简单而有效的方法。 我们的方法不是将本地点数分组到每个对象候选人, 而是在极点云中所有点点上计算对象的特性, 借助变换器\ cite{vaswani2017 注意} 的注意机制, 其中每个点的贡献在网络培训中自动学习。 随着关注堆积计划的改进, 我们的方法将对象特性连接到不同阶段, 并产生更准确的物体探测结果。 由于没有多少钟和哨子, 拟议的方法在两个广泛使用的基准基准中, 扫描- 3D V2 和 SGB Grofreal\\\\/ RGB 提供的代码。