Existing 3D instance segmentation methods are predominated by the bottom-up design -- manually fine-tuned algorithm to group points into clusters followed by a refinement network. However, by relying on the quality of the clusters, these methods generate susceptible results when (1) nearby objects with the same semantic class are packed together, or (2) large objects with loosely connected regions. To address these limitations, we introduce ISBNet, a novel cluster-free method that represents instances as kernels and decodes instance masks via dynamic convolution. To efficiently generate high-recall and discriminative kernels, we propose a simple strategy named Instance-aware Farthest Point Sampling to sample candidates and leverage the local aggregation layer inspired by PointNet++ to encode candidate features. Moreover, we show that predicting and leveraging the 3D axis-aligned bounding boxes in the dynamic convolution further boosts performance. Our method set new state-of-the-art results on ScanNetV2 (55.9), S3DIS (60.8), and STPLS3D (49.2) in terms of AP and retains fast inference time (237ms per scene on ScanNetV2).
翻译:现有的3D 例分解方法以自下而上的设计为主 -- -- 人工微调算法,将各点组成组,然后形成一个精细的网络。然而,通过依靠组群的质量,这些方法产生容易的结果:(1) 同一语义类物体相近的物体被包装在一起,或(2) 连接区域松散的大型物体。为了解决这些限制,我们引入了ISBNet,这是一种新型的无集束方法,它代表了通过动态共变作为内核和解码掩码的例子。为了高效生成高调和有区别的内核,我们建议了一个简单战略,名为 " 试测最远点抽样的取样点取样 ",并利用PointNet++所启发的地方集合层对候选特性进行编码。此外,我们表明,动态共振动中3D轴对齐的捆绑框将进一步提升性。我们的方法在ScANNetV2 (55.9) S3DIS (60.8)和STPLS3D (49.2) 中设定了新的状态技术结果。</s>