For visual estimation of optical flow, a crucial function for many vision tasks, unsupervised learning, using the supervision of view synthesis has emerged as a promising alternative to supervised methods, since ground-truth flow is not readily available in many cases. However, unsupervised learning is likely to be unstable when pixel tracking is lost due to occlusion and motion blur, or the pixel matching is impaired due to variation in image content and spatial structure over time. In natural environments, dynamic occlusion or object variation is a relatively slow temporal process spanning several frames. We, therefore, explore the optical flow estimation from multiple-frame sequences of dynamic scenes, whereas most of the existing unsupervised approaches are based on temporal static models. We handle the unsupervised optical flow estimation with a temporal dynamic model by introducing a spatial-temporal dual recurrent block based on the predictive coding structure, which feeds the previous high-level motion prior to the current optical flow estimator. Assuming temporal smoothness of optical flow, we use motion priors of the adjacent frames to provide more reliable supervision of the occluded regions. To grasp the essence of challenging scenes, we simulate various scenarios across long sequences, including dynamic occlusion, content variation, and spatial variation, and adopt self-supervised distillation to make the model understand the object's motion patterns in a prolonged dynamic environment. Experiments on KITTI 2012, KITTI 2015, Sintel Clean, and Sintel Final datasets demonstrate the effectiveness of our methods on unsupervised optical flow estimation. The proposal achieves state-of-the-art performance with advantages in memory overhead.
翻译:基于时间动态建模的多帧动态环境无监督学习光流技术
翻译后的摘要:
对于许多视觉任务来说,光流的视觉估计是至关重要的功能,无监督学习利用视图合成的监督方法已成为有前途的替代方法,因为在许多情况下,没有现成的光流真值可用。然而,由于在时间上丢失像素跟踪,或者由于图像内容和空间结构随时间而变化,像素匹配受损,无监督学习可能不稳定。在自然环境中,动态遮挡或对象变化是跨越多帧的相对缓慢的时间过程。因此,我们研究了从多帧动态场景中进行光流估计,而大多数现有的无监督方法基于时间静态模型。我们通过引入基于预测编码结构的空间-时间双重循环块来处理无监督光流估计,它将前一个高级运动先验馈送到当前光流估计器中。假设光流的时间平滑性,我们使用相邻帧的运动先验来提供对遮挡区域更可靠的监督。为了领会具有挑战性的场景的实质,我们在长序列中模拟了各种情况,包括动态遮挡、内容变化和空间变化,并采用自监督蒸馏来使模型了解物体在长时间动态环境中的运动模式。在KITTI 2012年、KITTI 2015年、Sintel清洁版和Sintel终极版数据集上的实验表明,我们的方法在无监督光流估计方面具有效果显著,并在内存开销上具有优势。