We extend neural 3D representations to allow for intuitive and interpretable user control beyond novel view rendering (i.e. camera control). We allow the user to annotate which part of the scene one wishes to control with just a small number of mask annotations in the training images. Our key idea is to treat the attributes as latent variables that are regressed by the neural network given the scene encoding. This leads to a few-shot learning framework, where attributes are discovered automatically by the framework, when annotations are not provided. We apply our method to various scenes with different types of controllable attributes (e.g. expression control on human faces, or state control in movement of inanimate objects). Overall, we demonstrate, to the best of our knowledge, for the first time novel view and novel attribute re-rendering of scenes from a single video.
翻译:我们扩展神经 3D 表达方式, 以便让用户能够直观和可解释的用户控制, 超越新的视图显示( 即相机控制) 。 我们允许用户在培训图像中用少量的掩码注释来说明哪个部分的场景。 我们的关键想法是将这些属性作为潜在变量来对待, 这些变量被神经网络从场景编码中退缩。 这导致一个微小的学习框架, 在未提供说明时, 属性会自动被框架发现。 我们用我们的方法来描述具有不同类型可控属性( 例如, 人脸的表达控制, 或对无生命物体移动的状态控制) 的各种场景。 总的来说, 我们以我们所知的最佳方式, 首次展示了新颖的观点和新颖的属性, 从单个视频中重新显示场景。