We propose an interactive approach for 3D instance segmentation, where users can iteratively collaborate with a deep learning model to segment objects in a 3D point cloud directly. Current methods for 3D instance segmentation are generally trained in a fully-supervised fashion, which requires large amounts of costly training labels, and does not generalize well to classes unseen during training. Few works have attempted to obtain 3D segmentation masks using human interactions. Existing methods rely on user feedback in the 2D image domain. As a consequence, users are required to constantly switch between 2D images and 3D representations, and custom architectures are employed to combine multiple input modalities. Therefore, integration with existing standard 3D models is not straightforward. The core idea of this work is to enable users to interact directly with 3D point clouds by clicking on desired 3D objects of interest~(or their background) to interactively segment the scene in an open-world setting. Specifically, our method does not require training data from any target domain, and can adapt to new environments where no appropriate training sets are available. Our system continuously adjusts the object segmentation based on the user feedback and achieves accurate dense 3D segmentation masks with minimal human effort (few clicks per object). Besides its potential for efficient labeling of large-scale and varied 3D datasets, our approach, where the user directly interacts with the 3D environment, enables new applications in AR/VR and human-robot interaction.
翻译:我们建议了3D例分解的交互式方法, 用户可以在3D点云中反复地与深学习模式合作, 将对象分解成3D例分解为直接的3D分解。 目前, 3D例分解方法一般都是在完全监督的情况下培训的, 这需要大量昂贵的培训标签, 并且没有在培训期间对课堂进行全面推广。 很少有作品试图利用人类互动获取 3D 分解面具。 现有方法依靠 2D 图像域的用户反馈。 因此, 用户需要不断转换 2D 图像和 3D 表示方式, 并使用定制结构来将多个输入方式结合起来。 因此, 与现有的标准 3D 分解模式的整合并非直截了当。 这项工作的核心理念是让用户能够直接与 3D 点云进行互动, 点击人们想要的 3D 对象 ; (或他们的背景) 在开放的环境下, 我们的方法不需要来自任何目标域的培训数据, 并且可以适应新的环境。 我们的系统不断调整基于用户 AR 3D 反馈的物体分解, 并实现精确的 3D 3D 快速的用户分解方式,, 。