Synthesizing novel views of dynamic humans from stationary monocular cameras is a popular scenario. This is particularly attractive as it does not require static scenes, controlled environments, or specialized hardware. In contrast to techniques that exploit multi-view observations to constrain the modeling, given a single fixed viewpoint only, the problem of modeling the dynamic scene is significantly more under-constrained and ill-posed. In this paper, we introduce Neural Motion Consensus Flow (MoCo-Flow), a representation that models the dynamic scene using a 4D continuous time-variant function. The proposed representation is learned by an optimization which models a dynamic scene that minimizes the error of rendering all observation images. At the heart of our work lies a novel optimization formulation, which is constrained by a motion consensus regularization on the motion flow. We extensively evaluate MoCo-Flow on several datasets that contain human motions of varying complexity, and compare, both qualitatively and quantitatively, to several baseline methods and variants of our methods. Pretrained model, code, and data will be released for research purposes upon paper acceptance.
翻译:从固定的单筒照相机中合成动态人类的新观点是一种流行的情景。这特别具有吸引力,因为它不需要静态场景、受控环境或专用硬件。与利用多视角观测限制建模的技术相比,仅仅以单一固定观点来限制建模,建模动态场景的问题更加受制约和错误。在本文中,我们引入了神经动态共识流(Moco-Flow),这是一个使用4D连续时间变量功能模拟动态场景的模型。拟议的表述方式是通过优化来学习的,即模拟一个动态场景,以尽量减少所有观测图像的错误。我们工作的核心是新颖的优化配方,这种配方受运动的协商一致规范制约。我们对包含不同复杂人类动作的若干数据集进行了广泛的评估,并将质量和数量两方面的模型与我们方法的若干基准方法和变体进行比较。预选模式、代码和数据将在文件被接受后发布供研究之用。