Deep learning depends on large amounts of labeled training data. Manual labeling is expensive and represents a bottleneck, especially for tasks such as segmentation, where labels must be assigned down to the level of individual points. That challenge is even more daunting for 3D data: 3D point clouds contain millions of points per scene, and their accurate annotation is markedly more time-consuming. The situation is further aggravated by the added complexity of user interfaces for 3D point clouds, which slows down annotation even more. For the case of 2D image segmentation, interactive techniques have become common, where user feedback in the form of a few clicks guides a segmentation algorithm -- nowadays usually a neural network -- to achieve an accurate labeling with minimal effort. Surprisingly, interactive segmentation of 3D scenes has not been explored much. Previous work has attempted to obtain accurate 3D segmentation masks using human feedback from the 2D domain, which is only possible if correctly aligned images are available together with the 3D point cloud, and it involves switching between the 2D and 3D domains. Here, we present an interactive 3D object segmentation method in which the user interacts directly with the 3D point cloud. Importantly, our model does not require training data from the target domain: when trained on ScanNet, it performs well on several other datasets with different data characteristics as well as different object classes. Moreover, our method is orthogonal to supervised (instance) segmentation methods and can be combined with them to refine automatic segmentations with minimal human effort.
翻译:深度学习取决于大量标签培训数据。 手动标签昂贵, 是一个瓶颈, 特别是分化等任务。 3D 数据的挑战更艰巨: 3D 点云每场有数百万个点, 准确的注解明显更费时。 3D 点云的用户界面复杂, 使情况进一步恶化, 3D 点云的用户界面更加复杂, 使批注速度更慢。 在 2D 图像分解的情况下, 互动技术已经变得常见, 用户反馈以几击形式引导分解算法( 现在通常是一个神经网络), 以最小努力的方式实现准确的分解。 令人惊讶的是, 3D 点云层的交互分解过程没有进行很多探索。 先前的工作试图利用2D 点域的人类反馈获取准确的 3D 分解掩码, 只有与 3D 点的云层云层图像同步, 并且涉及2D 和 3D 域域的用户对立点的切换换。 在这里, 我们展示一个互动的 3D 对象对象分解过程的计算方法, 而不是由不同的用户直接进行数据分析 。 。