Obtaining photorealistic reconstructions of objects from sparse views is inherently ambiguous and can only be achieved by learning suitable reconstruction priors. Earlier works on sparse rigid object reconstruction successfully learned such priors from large datasets such as CO3D. In this paper, we extend this approach to dynamic objects. We use cats and dogs as a representative example and introduce Common Pets in 3D (CoP3D), a collection of crowd-sourced videos showing around 4,200 distinct pets. CoP3D is one of the first large-scale datasets for benchmarking non-rigid 3D reconstruction "in the wild". We also propose Tracker-NeRF, a method for learning 4D reconstruction from our dataset. At test time, given a small number of video frames of an unseen object, Tracker-NeRF predicts the trajectories of its 3D points and generates new views, interpolating viewpoint and time. Results on CoP3D reveal significantly better non-rigid new-view synthesis performance than existing baselines.
翻译:从微小的视角获得光现实的物体重建,从本质上是模糊的,只能通过学习适当的重建前科来实现。关于稀有的僵硬物体重建的早期工作成功地从大型数据集(如CO3D)中学习了这些前科。在本文中,我们将这种方法推广到动态物体。我们用猫和狗作为典型的例子,并在3D(CoP3D)中引入了常见宠物(3D)集成的由众人组成的视频,显示大约4 200只不同的宠物。COP3D是“野外”非硬3D重建基准的第一批大型数据集之一。我们还提议了“跟踪者-NERF”,这是从我们的数据集中学习4D重建的方法。在试验时,由于一个看不见物体的视频框架不多,跟踪者-NERF预测了3D点的轨迹,并产生了新的观点、相互交集的观点和时间。关于CoP3D的结果显示比现有基线更好的非硬性新合成性表现。