Spatial-Temporal Video Super-Resolution (ST-VSR) technology generates high-quality videos with higher resolution and higher frame rates. Existing advanced methods accomplish ST-VSR tasks through the association of Spatial and Temporal video super-resolution (S-VSR and T-VSR). These methods require two alignments and fusions in S-VSR and T-VSR, which is obviously redundant and fails to sufficiently explore the information flow of consecutive spatial LR frames. Although bidirectional learning (future-to-past and past-to-future) was introduced to cover all input frames, the direct fusion of final predictions fails to sufficiently exploit intrinsic correlations of bidirectional motion learning and spatial information from all frames. We propose an effective yet efficient recurrent network with bidirectional interaction for ST-VSR, where only one alignment and fusion is needed. Specifically, it first performs backward inference from future to past, and then follows forward inference to super-resolve intermediate frames. The backward and forward inferences are assigned to learn structures and details to simplify the learning task with joint optimizations. Furthermore, a Hybrid Fusion Module (HFM) is designed to aggregate and distill information to refine spatial information and reconstruct high-quality video frames. Extensive experiments on two public datasets demonstrate that our method outperforms state-of-the-art methods in efficiency, and reduces calculation cost by about 22%.
翻译:现有先进方法通过空间和时空视频超分辨率(S-VSR和T-VSR)的组合,完成了ST-VSR的任务。这些方法要求在S-VSR和T-VSR中进行两种对齐和混,这显然多余,无法充分探索连续空间LR框架的信息流动。虽然引入双向学习(未来到过去和过去到未来)以覆盖所有输入框架,但最终预测的直接整合未能充分利用双向运动学习和所有框架的空间信息的内在相关性。我们建议为ST-VSR和T-VSR建立有效和高效的经常性网络,在S-VSR和T-VSR中进行双向互动,只要需要一次对齐和融合。具体地说,它首先从未来到过去进行倒退,然后向前推导出超级解的中间框架。后方和前方推法用于学习结构和细节,以简化双向双向运动学习任务,通过联合优化的图像模型进行升级和升级。此外,通过升级后向后向后向前方推推,通过升级的图像模型和图像模型来简化学习结构结构,以更新成本和图像模型,再调整。