Supervised training has led to state-of-the-art results in image and video denoising. However, its application to real data is limited since it requires large datasets of noisy-clean pairs that are difficult to obtain. For this reason, networks are often trained on realistic synthetic data. More recently, some self-supervised frameworks have been proposed for training such denoising networks directly on the noisy data without requiring ground truth. On synthetic denoising problems supervised training outperforms self-supervised approaches, however in recent years the gap has become narrower, especially for video. In this paper, we propose a study aiming to determine which is the best approach to train denoising networks for real raw videos: supervision on synthetic realistic data or self-supervision on real data. A complete study with quantitative results in case of natural videos with real motion is impossible since no dataset with clean-noisy pairs exists. We address this issue by considering three independent experiments in which we compare the two frameworks. We found that self-supervision on the real data outperforms supervision on synthetic data, and that in normal illumination conditions the drop in performance is due to the synthetic ground truth generation, not the noise model.
翻译:监督培训导致图像和视频失效,但是,对真实数据的应用有限,因为它需要大量难以获得的噪音清洁对对等的数据集,因此网络往往接受现实合成数据的培训。最近,提出了一些自我监督框架,以便在不要求地面真相的情况下直接对噪音数据进行分解网络培训。关于合成分解问题,监督培训优于自我监督方法,但近年来差距已经缩小,特别是对于视频而言。在本文件中,我们提议进行一项研究,旨在确定哪种是培训真正原始视频解密网络的最佳方法:监督合成现实数据或对真实数据进行自我监督。由于没有清洁的对等对等数据组合,因此,不可能对自然视频进行量化结果的全面研究,因为没有清洁的对等对等数据组合。我们考虑了三项独立实验,比较了两个框架。我们发现,对真实数据外观监督合成数据模型的自我监督已经变得狭窄了。在正常的合成数据生成过程中,在正常的合成噪音下,业绩的下降。