Video super-resolution (VSR) is the task of restoring high-resolution frames from a sequence of low-resolution inputs. Different from single image super-resolution, VSR can utilize frames' temporal information to reconstruct results with more details. Recently, with the rapid development of convolution neural networks (CNN), the VSR task has drawn increasing attention and many CNN-based methods have achieved remarkable results. However, only a few VSR approaches can be applied to real-world mobile devices due to the computational resources and runtime limitations. In this paper, we propose a \textit{Sliding Window based Recurrent Network} (SWRN) which can be real-time inference while still achieving superior performance. Specifically, we notice that video frames should have both spatial and temporal relations that can help to recover details, and the key point is how to extract and aggregate information. Address it, we input three neighboring frames and utilize a hidden state to recurrently store and update the important temporal information. Our experiment on REDS dataset shows that the proposed method can be well adapted to mobile devices and produce visually pleasant results.
翻译:视频超分辨率( VSR) 是从低分辨率输入序列中恢复高分辨率框架的任务。 不同于单一图像超分辨率, VSR可以使用框架时间信息来重建结果并提供更多细节。 最近, 随着神经神经网络的快速发展, VSR任务吸引了越来越多的注意力,许多CNN方法也取得了显著的成果。 然而,由于计算资源和运行时间的限制,只有少数几个 VSR 方法可以应用于真实世界移动设备。 在本文中, 我们提议了一种基于窗口的经常网络( SWRN), 它可以是实时的推断, 但仍能达到更高的性能。 具体地说, 我们注意到视频框架应该既有空间关系,也有时间关系, 有助于恢复细节, 关键点是如何提取和汇总信息。 解决这个问题, 我们输入了三个相邻框架, 并利用隐藏状态来经常性存储和更新重要的时间信息。 我们在REDS 数据集上的实验显示, 拟议的方法可以很好地适应移动设备, 并产生视觉上令人愉快的结果 。