To facilitate video denoising research, we construct a compelling dataset, namely, "Practical Video Denoising Dataset" (PVDD), containing 200 noisy-clean dynamic video pairs in both sRGB and RAW format. Compared with existing datasets consisting of limited motion information, PVDD covers dynamic scenes with varying and natural motion. Different from datasets using primarily Gaussian or Poisson distributions to synthesize noise in the sRGB domain, PVDD synthesizes realistic noise from the RAW domain with a physically meaningful sensor noise model followed by ISP processing. Moreover, we also propose a new video denoising framework, called Recurrent Video Denoising Transformer (RVDT), which can achieve SOTA performance on PVDD and other current video denoising benchmarks. RVDT consists of both spatial and temporal transformer blocks to conduct denoising with long-range operations on the spatial dimension and long-term propagation on the temporal dimension. Especially, RVDT exploits the attention mechanism to implement the bi-directional feature propagation with both implicit and explicit temporal modeling. Extensive experiments demonstrate that 1) models trained on PVDD achieve superior denoising performance on many challenging real-world videos than on models trained on other existing datasets; 2) trained on the same dataset, our proposed RVDT can have better denoising performance than other types of networks.
翻译:为了便利视频拆解研究,我们构建了一个令人信服的数据集,即“实践性视频代言数据集”,以SRGB和RAW格式,包含200个噪音清洁的动态视频配对。与现有的由有限的运动信息组成的数据集相比,PVDD覆盖动态场景,具有不同和自然的动态场景。不同于主要使用高山或普瓦松分布来合成SRGB域噪音的数据集,PVDD合成了来自RAW域的现实噪音,并有一个具有物理意义的传感器噪声模型,然后由ISP处理。此外,我们还提出了一个新的视频拆译框架,称为经常性视频代言变器(RVDT),可以实现STA在PVDD和其他当前视频拆解析基准方面的性业绩。RVDT包括空间和时间变异端区块,主要用于在空间层面和时间层面长期传播的远程操作。特别是,RVDDT利用关注机制,以隐含和明确的时间模型进行双向光谱传播。此外,我们还提出了一个新的视频拆换框架框架,这可以实现SDDD的S模型上经培训过其他类型业绩测试的其他模型。