Existing video super-resolution methods often utilize a few neighboring frames to generate a higher-resolution image for each frame. However, the redundant information between distant frames has not been fully exploited in these methods: corresponding patches of the same instance appear across distant frames at different scales. Based on this observation, we propose a video super-resolution method with long-term cross-scale aggregation that leverages similar patches (self-exemplars) across distant frames. Our model also consists of a multi-reference alignment module to fuse the features derived from similar patches: we fuse the features of distant references to perform high-quality super-resolution. We also propose a novel and practical training strategy for referenced-based super-resolution. To evaluate the performance of our proposed method, we conduct extensive experiments on our collected CarCam dataset and the Waymo Open dataset, and the results demonstrate our method outperforms state-of-the-art methods. Our source code will be publicly available.
翻译:现有视频超分辨率方法通常使用几个相邻框架为每个相框生成高分辨率图像。 但是,远框之间的冗余信息尚未在这些方法中得到充分利用:不同尺度的远框中出现相同实例的相应补丁。 基于此观察,我们提出一个视频超分辨率方法,其长期跨尺度集成利用相近的补丁(自显像器)跨远框的类似补丁(自显像器)。我们的模型还包含一个多参考协调模块,以结合来自类似补丁的特征:我们结合远程引用的特征,以进行高质量的超分辨率工作。我们还提出了用于参考超分辨率的新颖而实用的培训战略。为了评估我们所建议方法的性能,我们对所收集的 CarCam数据集和Waymo Open数据集进行了广泛的实验,结果显示我们的方法优于最新的方法。我们的源代码将公开提供。