Video super-resolution (VSR) has many applications that pose strict causal, real-time, and latency constraints, including video streaming and TV. We address the VSR problem under these settings, which poses additional important challenges since information from future frames are unavailable. Importantly, designing efficient, yet effective frame alignment and fusion modules remain central problems. In this work, we propose a recurrent VSR architecture based on a deformable attention pyramid (DAP). Our DAP aligns and integrates information from the recurrent state into the current frame prediction. To circumvent the computational cost of traditional attention-based methods, we only attend to a limited number of spatial locations, which are dynamically predicted by the DAP. Comprehensive experiments and analysis of the proposed key innovations show the effectiveness of our approach. We significantly reduce processing time in comparison to state-of-the-art methods, while maintaining a high performance. We surpass state-of-the-art method EDVR-M on two standard benchmarks with a speed-up of over 3x.
翻译:视频超级分辨率(VSR)有许多应用程序,这些应用程序构成严格的因果关系、实时和延迟限制,包括视频流和电视。我们处理这些环境中的VSR问题,由于无法获得未来框架的信息,这一问题构成额外的重大挑战。重要的是,设计高效而有效的框架对齐和聚合模块仍然是中心问题。在这项工作中,我们提议基于可变关注金字塔(DAP)的经常性VSR结构。我们的DAP将来自经常状态的信息与当前框架预测相匹配并整合。为避免传统关注方法的计算成本,我们只处理有限的空间位置,而DAP是动态预测的。对拟议关键创新的全面试验和分析显示了我们的方法的有效性。我们大幅缩短了与最新方法相比的处理时间,同时保持高性能。我们超过了两个标准基准的先进EDVR-M方法,速度超过3x。