Existing video denoising methods typically assume noisy videos are degraded from clean videos by adding Gaussian noise. However, deep models trained on such a degradation assumption will inevitably give rise to poor performance for real videos due to degradation mismatch. Although some studies attempt to train deep models on noisy and noise-free video pairs captured by cameras, such models can only work well for specific cameras and do not generalize well for other videos. In this paper, we propose to lift this limitation and focus on the problem of general real video denoising with the aim to generalize well on unseen real-world videos. We tackle this problem by firstly investigating the common behaviors of video noises and observing two important characteristics: 1) downscaling helps to reduce the noise level in spatial space and 2) the information from the adjacent frames help to remove the noise of current frame in temporal space. Motivated by these two observations, we propose a multi-scale recurrent architecture by making full use of the above two characteristics. Secondly, we propose a synthetic real noise degradation model by randomly shuffling different noise types to train the denoising model. With a synthesized and enriched degradation space, our degradation model can help to bridge the distribution gap between training data and real-world data. Extensive experiments demonstrate that our proposed method achieves the state-of-the-art performance and better generalization ability than existing methods on both synthetic Gaussian denoising and practical real video denoising.
翻译:虽然有些研究试图对摄影机拍摄的噪音和无噪音的视频配对进行深层模型,但这类模型只能对特定相机产生良好效果,不能对其他视频进行广泛介绍。在这份文件中,我们提议取消这一限制,并侧重于一般真实的视频降色问题,目的是对看不见的真实世界视频进行广泛推广。我们通过首先调查视频噪音的共同行为并观察两个重要特征来解决这一问题:1)降尺度有助于降低空间空间的噪音水平,2)从相邻框架获得的信息有助于消除当前框架在时间空间的噪音。受这两种观察的启发,我们建议通过充分利用以上两个特征,建立一个多尺度的经常性结构。第二,我们建议通过随机地打乱不同声音类型来模拟合成真实的噪音降解模型,以培养脱色模型。在综合和浓缩空间的噪音和两个重要特征上,我们综合和放大的图像空间的噪音水平有助于降低空间的噪音水平,同时展示我们现有的水平模型和深度数据分析方法,从而改善我们现有的水平分析方法。