Convenient 4D modeling of human-object interactions is essential for numerous applications. However, monocular tracking and rendering of complex interaction scenarios remain challenging. In this paper, we propose Instant-NVR, a neural approach for instant volumetric human-object tracking and rendering using a single RGBD camera. It bridges traditional non-rigid tracking with recent instant radiance field techniques via a multi-thread tracking-rendering mechanism. In the tracking front-end, we adopt a robust human-object capture scheme to provide sufficient motion priors. We further introduce a separated instant neural representation with a novel hybrid deformation module for the interacting scene. We also provide an on-the-fly reconstruction scheme of the dynamic/static radiance fields via efficient motion-prior searching. Moreover, we introduce an online key frame selection scheme and a rendering-aware refinement strategy to significantly improve the appearance details for online novel-view synthesis. Extensive experiments demonstrate the effectiveness and efficiency of our approach for the instant generation of human-object radiance fields on the fly, notably achieving real-time photo-realistic novel view synthesis under complex human-object interactions.
翻译:方便地进行人-物交互的4D建模对许多应用来说都是至关重要的,但是复杂交互场景的单目跟踪和渲染仍然具有挑战性。本文提出了Instant-NVR,这是一种利用单个RGBD相机进行即时体人-物跟踪和渲染的神经方法。它通过多线程追踪-渲染机制,将传统的非刚性跟踪与最近的即时光度场技术连接起来。在跟踪前端,我们采用鲁棒的人-物捕捉方案来提供足够的运动先验知识。我们进一步引入了一个分离的即时神经表示,并使用新型混合变形模块来发掘交互场景的局部信息。我们还通过高效的运动先验搜索提供了动态/静态光度场的实时重建机制。此外,我们引入了一种在线关键帧选择方案和一种渲染感知的细化策略,以显著提高在线新视图综合的外观细节。广泛的实验表明,我们的方法对于即时生成人-物光度场非常有效和高效,特别是在复杂的人-物交互场景下实现了实时照片般逼真的新视图综合。