Joint representation of geometry, colour and semantics using a 3D neural field enables accurate dense labelling from ultra-sparse interactions as a user reconstructs a scene in real-time using a handheld RGB-D sensor. Our iLabel system requires no training data, yet can densely label scenes more accurately than standard methods trained on large, expensively labelled image datasets. Furthermore, it works in an 'open set' manner, with semantic classes defined on the fly by the user. ILabel's underlying model is a multilayer perceptron (MLP) trained from scratch in real-time to learn a joint neural scene representation. The scene model is updated and visualised in real-time, allowing the user to focus interactions to achieve efficient labelling. A room or similar scene can be accurately labelled into 10+ semantic categories with only a few tens of clicks. Quantitative labelling accuracy scales powerfully with the number of clicks, and rapidly surpasses standard pre-trained semantic segmentation methods. We also demonstrate a hierarchical labelling variant.
翻译:使用 3D 神经场景的几何、 颜色和语义共同表示 3D 神经场景, 使得用户使用手持式 RGB- D 传感器实时重建场景时, 用户能够从超粗的相互作用中得出精确的密集标签。 我们的 iLabel 系统不需要任何培训数据, 却可以比在大型、 昂贵标签图像数据集上培训的标准方法更精确地刻画场景。 此外, 它以“ 开放集 ” 方式工作, 用户在苍蝇上定义语义类。 ILabel 的基建模是实时训练的多层透视器( MLP ), 实时学习联合神经场景演示。 现场模型可以实时更新和可视化, 使用户能够集中互动, 实现高效的标签。 一个房间或相似的场景可以精确标记成 10+ 语义分类类别, 仅几击。 定量标签精确度比重, 与点击数相近超前 标准的语义分法 。 我们还展示了一个等级标签变 。