Video restoration, aiming at restoring clear frames from degraded videos, has been attracting increasing attention. Video restoration is required to establish the temporal correspondences from multiple misaligned frames. To achieve that end, existing deep methods generally adopt complicated network architectures, such as integrating optical flow, deformable convolution, cross-frame or cross-pixel self-attention layers, resulting in expensive computational cost. We argue that with proper design, temporal information utilization in video restoration can be much more efficient and effective. In this study, we propose a simple, fast yet effective framework for video restoration. The key of our framework is the grouped spatial-temporal shift, which is simple and lightweight, but can implicitly establish inter-frame correspondences and achieve multi-frame aggregation. Coupled with basic 2D U-Nets for frame-wise encoding and decoding, such an efficient spatial-temporal shift module can effectively tackle the challenges in video restoration. Extensive experiments show that our framework surpasses previous state-of-the-art method with 43% of its computational cost on both video deblurring and video denoising.
翻译:视频恢复旨在恢复退化视频的清晰框架,吸引了越来越多的关注。视频恢复是为了从多个错位框中建立时间对应关系。为此,现有深层方法通常采用复杂的网络结构,如光学流、变形变形、跨框架或交叉像素自省层,从而产生昂贵的计算成本。我们认为,如果设计得当,视频恢复中的时间信息利用可以大大提高效果和效率。在本研究中,我们提议了一个简单、快速、有效的视频恢复框架。我们框架的关键是空间时空转换组合,这种转换既简单又轻巧,但可以隐含地建立框架间通信并实现多框架组合。与基本 2D U 网络相结合,用于框架的编码和解码,这样高效的空间时变模块可以有效地应对视频恢复中的挑战。广泛的实验表明,我们的框架超过了先前的状态,其计算成本的43%用于视频分流和视频分解。