Synthesizing novel views of dynamic humans from stationary monocular cameras is a specialized but desirable setup. This is particularly attractive as it does not require static scenes, controlled environments, or specialized capture hardware. In contrast to techniques that exploit multi-view observations, the problem of modeling a dynamic scene from a single view is significantly more under-constrained and ill-posed. In this paper, we introduce Neural Motion Consensus Flow (MoCo-Flow), a representation that models dynamic humans in stationary monocular cameras using a 4D continuous time-variant function. We learn the proposed representation by optimizing for a dynamic scene that minimizes the total rendering error, over all the observed images. At the heart of our work lies a carefully designed optimization scheme, which includes a dedicated initialization step and is constrained by a motion consensus regularization on the estimated motion flow. We extensively evaluate MoCo-Flow on several datasets that contain human motions of varying complexity, and compare, both qualitatively and quantitatively, to several baselines and ablated variations of our methods, showing the efficacy and merits of the proposed approach. Pretrained model, code, and data will be released for research purposes upon paper acceptance.
翻译:合成静止单筒照相机中的动态人的新观点是一个专门但可取的设置。 这特别具有吸引力,因为它不需要静态场景、受控环境或专用捕捉硬件。 与利用多视角观测的技术相比,从单一角度模拟动态场景的问题更加受制约和不当。 在本文中,我们引入了神经动态共识流(Moco-Flow), 这是一种代表, 即利用4D连续时间变量功能在静止单筒照相机中模拟动态人。 我们通过优化一个能动场景,最大限度地减少所有观测到的图像的总误差,来了解拟议中的表述方式。 我们工作的核心是一个精心设计的优化计划,其中包括一个专门的初始化步骤,并受到关于估计运动流动态的动态共识规范的制约。 我们广泛评价了几个包含不同复杂人类动作的数据集的Moco-Flow(Moco-Flow),并在质量和数量上将若干基线和我们方法的扩大变异,以显示拟议方法的功效和优点。 预设模型、代码和数据将在文件上公布,用于研究目的。