We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos. Compared to prior work, our problem setting is more realistic yet more challenging for three reasons: 1) due to occlusion or camera settings an object of interest may never be entirely visible, but we aim to reconstruct the complete shape; 2) we aim to handle different object dynamics including rigid motion, non-rigid motion, and articulation; 3) we aim to reconstruct different categories of objects with one unified framework. To address these challenges, we develop two novel modules. First, we introduce a canonical 4D implicit function which is pixel-aligned with aggregated temporal visual cues. Second, we develop a 4D transformation module which captures object dynamics to support temporal propagation and aggregation. We study the efficacy of REDO in extensive experiments on synthetic RGBD video datasets SAIL-VOS 3D and DeformingThings4D++, and on real-world video data 3DPW. We find REDO outperforms state-of-the-art dynamic reconstruction methods by a margin. In ablation studies we validate each developed component.
翻译:我们引入了REDO, 这是一种从 RGBD 或校准视频中重建动态物体的等级不可知框架。 与先前的工作相比, 我们的问题设置更现实, 更具有挑战性, 原因有三:(1) 由于封闭或相机设置, 一个感兴趣的对象可能永远不会完全可见, 但我们的目标是重建完整的形状; (2) 我们的目标是处理不同的物体动态, 包括僵硬运动、 非硬性运动和表达; (3) 我们的目标是用一个统一的框架来重建不同种类的物体。 为了应对这些挑战, 我们开发了两个新模块。 首先, 我们引入了一个与总时间视觉提示相匹配的卡通 4D 隐含功能。 其次, 我们开发了一个四维转换模块, 捕捉对象动态以支持时间传播和汇总。 我们在合成 RGBD 视频数据集( SAIL- VOS 3D) 和变形THINGS4D++) 以及真实世界视频数据 3DPW 。 我们发现 REDOD 超越了状态动态重建方法的比值 。