Humans have a strong intuitive understanding of physical processes such as fluid falling by just a glimpse of such a scene picture, i.e., quickly derived from our immersive visual experiences in memory. This work achieves such a photo-to-fluid-dynamics reconstruction functionality learned from unannotated videos, without any supervision of ground-truth fluid dynamics. In a nutshell, a differentiable Euler simulator modeled with a ConvNet-based pressure projection solver, is integrated with a volumetric renderer, supporting end-to-end/coherent differentiable dynamic simulation and rendering. By endowing each sampled point with a fluid volume value, we derive a NeRF-like differentiable renderer dedicated from fluid data; and thanks to this volume-augmented representation, fluid dynamics could be inversely inferred from the error signal between the rendered result and ground-truth video frame (i.e., inverse rendering). Experiments on our generated Fluid Fall datasets and DPI Dam Break dataset are conducted to demonstrate both effectiveness and generalization ability of our method.
翻译:摘要:人类凭借对生动场景的直观理解能够通过一瞥的方式推断流体等物理过程,这种推理能力是从互动的视觉体验中记忆所获得的。本文实现了从未标注的视频中学习图像到流体动力学重建功能,并且不需要监督任何地面真实流体动力学数据。简单地说,我们将基于 ConvNet 的压力投影求解器建模的可微分的 Euler 模拟器与体积渲染器相结合,支持端到端、连贯的可微分动态模拟和渲染。通过给每个采样点赋予流体体积值,我们得到了一个类似于体积增强的 NeRF 渲染器,这使得我们可以通过渲染结果和真实视频帧之间的误差信号来逆向推断流体动力学(即反演渲染)。我们对自生成的 Fluid Fall 数据集和 DPI Dam Break 数据集进行了实验,证明了我们方法的有效性和泛化能力。