We propose CLA-NeRF -- a Category-Level Articulated Neural Radiance Field that can perform view synthesis, part segmentation, and articulated pose estimation. CLA-NeRF is trained at the object category level using no CAD models and no depth, but a set of RGB images with ground truth camera poses and part segments. During inference, it only takes a few RGB views (i.e., few-shot) of an unseen 3D object instance within the known category to infer the object part segmentation and the neural radiance field. Given an articulated pose as input, CLA-NeRF can perform articulation-aware volume rendering to generate the corresponding RGB image at any camera pose. Moreover, the articulated pose of an object can be estimated via inverse rendering. In our experiments, we evaluate the framework across five categories on both synthetic and real-world data. In all cases, our method shows realistic deformation results and accurate articulated pose estimation. We believe that both few-shot articulated object rendering and articulated pose estimation open doors for robots to perceive and interact with unseen articulated objects.
翻译:我们建议使用CLA-NERF -- -- 一个能进行视觉合成、部分分解和表达成形的类别分立神经辐射场。CLA-NERF在物体类别一级接受培训,没有CAD模型,也没有深度,但有一组带地面真象摄像头的 RGB 图像和片段。在推断中,在已知类别中,只需要对未知的3D 对象实例的几处 RGB 视图(即几发),即可推断物体的分解部分和神经光亮场。鉴于一个清晰的外观,CLA-NERF可以进行表达觉量的显示,在任何镜头上生成相应的 RGB 图像。此外,一个物体的外观可以通过反演来估计。在我们的实验中,我们对合成和真实世界数据的五类框架进行了评估。在所有情况下,我们的方法都显示现实的变形结果和准确的外观估计。我们认为,只有几发明的表达的物体和清晰的显示为机器人进行感知和互动的开阔门估计。