RealFlow: 以EM为基础的现实光学光学流动数据集从视频中生成 (RealFlow: EM-based Realistic Optical Flow Dataset Generation from Videos)

Obtaining the ground truth labels from a video is challenging since the manual annotation of pixel-wise flow labels is prohibitively expensive and laborious. Besides, existing approaches try to adapt the trained model on synthetic datasets to authentic videos, which inevitably suffers from domain discrepancy and hinders the performance for real-world applications. To solve these problems, we propose RealFlow, an Expectation-Maximization based framework that can create large-scale optical flow datasets directly from any unlabeled realistic videos. Specifically, we first estimate optical flow between a pair of video frames, and then synthesize a new image from this pair based on the predicted flow. Thus the new image pairs and their corresponding flows can be regarded as a new training set. Besides, we design a Realistic Image Pair Rendering (RIPR) module that adopts softmax splatting and bi-directional hole filling techniques to alleviate the artifacts of the image synthesis. In the E-step, RIPR renders new images to create a large quantity of training data. In the M-step, we utilize the generated training data to train an optical flow network, which can be used to estimate optical flows in the next E-step. During the iterative learning steps, the capability of the flow network is gradually improved, so is the accuracy of the flow, as well as the quality of the synthesized dataset. Experimental results show that RealFlow outperforms previous dataset generation methods by a considerably large margin. Moreover, based on the generated dataset, our approach achieves state-of-the-art performance on two standard benchmarks compared with both supervised and unsupervised optical flow methods. Our code and dataset are available at https://github.com/megvii-research/RealFlow

翻译：从视频中获得地面真相标签具有挑战性,因为像素流标签的人工说明非常昂贵和费力。此外,现有的方法试图将经过训练的合成数据集模型改造成真实的视频,这不可避免地会受域差异的影响,妨碍真实世界应用程序的性能。为了解决这些问题,我们建议 RealFlow,一个基于期望-最大化的框架,可以直接从任何未贴标签的现实视频中创建大型光学流数据集。具体地说,我们首先估计一对视频框架之间的光学流,然后根据预测流合成一对新图像。因此,新的图像数据集及其相应的流动可以被视为一个新的培训数据集。此外,我们设计了一个Realistic 图像 Pair Rendering (RIPR) 模块,采用软max 螺旋和双向空洞填充技术来减轻图像合成的创品。在E-step中, RIPR 提供新的图像以创造大量的培训数据。在M-step上,我们利用生成的培训数据比新图像数据,在光质流流流中进行大规模数据,在光学流数据中, 正在逐步地使用数据生成的磁流数据,在生成过程中, 正在生成数据将显示我们的光化网络生成数据, 正在生成数据,在生成中, 正在生成的生成数据将逐渐生成数据将显示。