Video Super-Resolution (VSR) aims to recover sequences of high-resolution (HR) frames from low-resolution (LR) frames. Previous methods mainly utilize temporally adjacent frames to assist the reconstruction of target frames. However, in the real world, there is a lot of irrelevant information in adjacent frames of videos with fast scene switching, these VSR methods cannot adaptively distinguish and select useful information. In contrast, with a transformer structure suitable for temporal tasks, we devise a novel adaptive scenario video super-resolution method. Specifically, we use optical flow to label the patches in each video frame, only calculate the attention of patches with the same label. Then select the most relevant label among them to supplement the spatial-temporal information of the target frame. This design can directly make the supplementary information come from the same scene as much as possible. We further propose a cross-scale feature aggregation module to better handle the scale variation problem. Compared with other video super-resolution methods, our method not only achieves significant performance gains on single-scene videos but also has better robustness on cross-scene datasets.
翻译:视频超级分辨率(VSR)旨在从低分辨率(LR)框中恢复高分辨率(HR)框架的序列。 以往的方法主要使用时间相邻的框架来帮助重建目标框架。 然而,在现实世界中,相邻的视频框架中有许多不相干的信息,带有快速切换场景,这些视频超分辨率(VSR)方法无法适应性地区分和选择有用的信息。 相比之下,我们用适合时间任务的变压器结构设计了一种新的适应性假设情景视频超分辨率方法。 具体地说,我们使用光学流来给每个视频框中的补丁贴上标签,只计算同一标签的补丁的注意度。 然后在其中选择最相关的标签来补充目标框架的空间时空信息。 这一设计可以直接使补充信息尽可能从同一场景中产生。 我们进一步提议一个跨尺度的特征组合模块来更好地处理比例变异问题。 与其他视频超分辨率方法相比,我们的方法不仅在单色视频视频上取得显著的性能增益,而且在交叉的数据集上也更加稳健。