Video provides us with the spatio-temporal consistency needed for visual learning. Recent approaches have utilized this signal to learn correspondence estimation from close-by frame pairs. However, by only relying on close-by frame pairs, those approaches miss out on the richer long-range consistency between distant overlapping frames. To address this, we propose a self-supervised approach for correspondence estimation that learns from multiview consistency in short RGB-D video sequences. Our approach combines pairwise correspondence estimation and registration with a novel SE(3) transformation synchronization algorithm. Our key insight is that self-supervised multiview registration allows us to obtain correspondences over longer time frames; increasing both the diversity and difficulty of sampled pairs. We evaluate our approach on indoor scenes for correspondence estimation and RGB-D pointcloud registration and find that we perform on-par with supervised approaches.
翻译:视频为我们提供了视觉学习所需的时空一致性。 最近的方法利用了这个信号从近距离框架对面学习通信估算。 但是,仅仅依靠近距离框架对面,这些方法就忽略了远距离重叠框架之间更为丰富的长距离一致性。 为了解决这个问题,我们提议了一种自我监督的通信估算方法,从短距离 RGB-D 视频序列中的多视角一致性中学习。 我们的方法将对通信估算和登记与新的 SE(3) 转换同步算法结合起来。 我们的关键见解是,自我监督的多视图登记让我们能够在较长的时限内获得通信;增加抽样对对面的多样性和难度。 我们评估了我们在室内现场的通信估算和RGB-D 点球注册方法,发现我们用监督方法进行在线操作。