We present a method for learning 3D geometry and physics parameters of a dynamic scene from only a monocular RGB video input. To decouple the learning of underlying scene geometry from dynamic motion, we represent the scene as a time-invariant signed distance function (SDF) which serves as a reference frame, along with a time-conditioned deformation field. We further bridge this neural geometry representation with a differentiable physics simulator by designing a two-way conversion between the neural field and its corresponding hexahedral mesh, enabling us to estimate physics parameters from the source video by minimizing a cycle consistency loss. Our method also allows a user to interactively edit 3D objects from the source video by modifying the recovered hexahedral mesh, and propagating the operation back to the neural field representation. Experiments show that our method achieves superior mesh and video reconstruction of dynamic scenes compared to competing Neural Field approaches, and we provide extensive examples which demonstrate its ability to extract useful 3D representations from videos captured with consumer-grade cameras.
翻译:我们提出一种方法来学习动态场景的三维几何和物理参数。 为了将基本场景几何学与动态运动脱钩, 我们将场景作为时间- 异点签名的距离函数( SDF), 以及一个有时间条件的变形场。 我们通过设计神经场与其对应的六光线网块之间的双向转换, 将神经场和不同的物理模拟器进一步连接起来, 使我们能够通过尽可能减少周期一致性损失, 从源视频中估计物理参数。 我们的方法还允许用户通过修改已回收的六光线网, 从源视频中交互编辑 3D 对象, 并将操作推回到神经场代表处。 实验显示, 我们的方法比相竞的神经场方法实现了更高级的网格和视频重建, 我们提供了大量的例子, 表明它有能力从与消费级相机拍摄的视频中提取有用的 3D 。