We introduce a 3D instance representation, termed instance kernels, where instances are represented by one-dimensional vectors that encode the semantic, positional, and shape information of 3D instances. We show that instance kernels enable easy mask inference by simply scanning kernels over the entire scenes, avoiding the heavy reliance on proposals or heuristic clustering algorithms in standard 3D instance segmentation pipelines. The idea of instance kernel is inspired by recent success of dynamic convolutions in 2D/3D instance segmentation. However, we find it non-trivial to represent 3D instances due to the disordered and unstructured nature of point cloud data, e.g., poor instance localization can significantly degrade instance representation. To remedy this, we construct a novel 3D instance encoding paradigm. First, potential instance centroids are localized as candidates. Then, a candidate merging scheme is devised to simultaneously aggregate duplicated candidates and collect context around the merged centroids to form the instance kernels. Once instance kernels are available, instance masks can be reconstructed via dynamic convolutions whose weights are conditioned on instance kernels. The whole pipeline is instantiated with a dynamic kernel network (DKNet). Results show that DKNet outperforms the state of the arts on both ScanNetV2 and S3DIS datasets with better instance localization. Code is available: https://github.com/W1zheng/DKNet.
翻译:我们引入了 3D 实例表达式, 称为 例内核, 实例内核由一维矢量代表, 它编码了 3D 实例的语义、 位置和形状信息。 我们显示, 例内核能够通过简单的扫描整个场景的内核, 避免在标准 3D 实例分割管道中大量依赖建议或超常组合算法, 避免大量依赖标准 3D 实例分解管道中的建议或超常性组合算法。 实例内核的概念是2D/3D 实例分割中动态共变异的。 然而, 我们发现由于点云数据( 例如, 低实例的本地化可以显著地降低实例的表达方式) 3D, 我们为此构建了一个新的 3D 实例编码模式。 首先, 潜在的例内核是作为候选人的本地化机制。 然后, 候选的合并计划被设计为同时复制的候选人, 收集合并的内核内核网络背景, 形成实例内核 。 实例内核, 可以通过动态内核网络 来进行更好的掩码 。