Learning based representation has become the key to the success of many computer vision systems. While many 3D representations have been proposed, it is still an unaddressed problem for how to represent a dynamically changing 3D object. In this paper, we introduce a compositional representation for 4D captures, i.e. a deforming 3D object over a temporal span, that disentangles shape, initial state, and motion respectively. Each component is represented by a latent code via a trained encoder. To model the motion, a neural Ordinary Differential Equation (ODE) is trained to update the initial state conditioned on the learned motion code, and a decoder takes the shape code and the updated pose code to reconstruct 4D captures at each time stamp. To this end, we propose an Identity Exchange Training (IET) strategy to encourage the network to learn effectively decoupling each component. Extensive experiments demonstrate that the proposed method outperforms existing state-of-the-art deep learning based methods on 4D reconstruction, and significantly improves on various tasks, including motion transfer and completion.
翻译:许多计算机视觉系统成功的关键是基于学习的代表性。 虽然提出了许多3D代表, 但对于如何代表动态变化的 3D 对象来说,它仍然是一个尚未解决的问题。 在本文件中, 我们引入了4D 捕获的构成性代表, 即一个在时间跨度上变形的 3D 对象, 分别分解形状、 初始状态和运动。 每个组成部分都由经过培训的编码编码代表。 为了模拟该动作, 一个神经普通差异方程式(ODE) 接受了培训, 以更新以学习的运动代码为条件的初始状态, 并且一个解码器在每次印章上都使用形状代码和最新设置代码来重建 4D 捕获。 为此, 我们提议了一个身份交换培训战略, 鼓励网络有效地学习将每个组成部分分解。 广泛的实验表明, 拟议的方法超越了基于 4D 重建 的状态深层次学习方法, 并大大改进了各种任务, 包括运动转移和完成。