In this paper, we propose a learning-based approach for denoising raw videos captured under low lighting conditions. We propose to do this by first explicitly aligning the neighboring frames to the current frame using a convolutional neural network (CNN). We then fuse the registered frames using another CNN to obtain the final denoised frame. To avoid directly aligning the temporally distant frames, we perform the two processes of alignment and fusion in multiple stages. Specifically, at each stage, we perform the denoising process on three consecutive input frames to generate the intermediate denoised frames which are then passed as the input to the next stage. By performing the process in multiple stages, we can effectively utilize the information of neighboring frames without directly aligning the temporally distant frames. We train our multi-stage system using an adversarial loss with a conditional discriminator. Specifically, we condition the discriminator on a soft gradient mask to prevent introducing high-frequency artifacts in smooth regions. We show that our system is able to produce temporally coherent videos with realistic details. Furthermore, we demonstrate through extensive experiments that our approach outperforms state-of-the-art image and video denoising methods both numerically and visually.
翻译:在本文中,我们提出对在低光度条件下拍摄的原始视频进行分解的基于学习的方法;我们提议首先将邻接框架与当前框架明确挂钩,方法是使用一个革命性神经网络(CNN),然后用另一个CNN将登记框架与当前框架结合起来,以获得最后的分解框架。为避免直接对时间遥远的框架进行对齐,我们在多个阶段执行两个对齐和聚合的过程。具体地说,在每一个阶段,我们对连续三个输入框架进行分解过程,以产生中间分解框架,然后作为输入进入下一阶段的输入。在多个阶段中,我们可以有效地利用邻接框架的信息,而不直接对时间遥远的框架进行对齐。我们用一个有条件的偏差来培训我们的多阶段系统。具体地说,我们用一个软梯度的遮罩来设置歧视者,以防止在平坦区域引入高频度的人工制品。我们显示我们的系统能够制作时间上一致的视频,并附有现实的细节。此外,我们通过广泛的实验,我们通过超越状态的图像和视频解析方法,我们展示了我们的方法。