We introduce an approach for selecting objects in neural volumetric 3D representations, such as multi-plane images (MPI) and neural radiance fields (NeRF). Our approach takes a set of foreground and background 2D user scribbles in one view and automatically estimates a 3D segmentation of the desired object, which can be rendered into novel views. To achieve this result, we propose a novel voxel feature embedding that incorporates the neural volumetric 3D representation and multi-view image features from all input views. To evaluate our approach, we introduce a new dataset of human-provided segmentation masks for depicted objects in real-world multi-view scene captures. We show that our approach out-performs strong baselines, including 2D segmentation and 3D segmentation approaches adapted to our task.
翻译:我们引入了一种在神经体积3D表示式中选择物体的方法,例如多平面图像和神经光亮场(NERF)等。我们的方法在一种观点中采用了一组前景和背景2D用户刻字,并自动估计了理想对象的三维分解,可以将其转化为新观点。为了实现这一结果,我们提议了一个新的 voxel 特征嵌入,将所有投入观点中的神经体积3D 表示式和多视图图像特征纳入其中。为了评估我们的方法,我们为现实世界多视图场景捕捉中被描绘的物体引入了一套新的由人类提供的分解面罩数据集。我们表明,我们的方法超越了强大的基线,包括2D分解和3D分解法适合我们的任务。