The target of space-time video super-resolution (STVSR) is to increase the spatial-temporal resolution of low-resolution (LR) and low frame rate (LFR) videos. Recent approaches based on deep learning have made significant improvements, but most of them only use two adjacent frames, that is, short-term features, to synthesize the missing frame embedding, which cannot fully explore the information flow of consecutive input LR frames. In addition, existing STVSR models hardly exploit the temporal contexts explicitly to assist high-resolution (HR) frame reconstruction. To address these issues, in this paper, we propose a deformable attention network called STDAN for STVSR. First, we devise a long-short term feature interpolation (LSTFI) module, which is capable of excavating abundant content from more neighboring input frames for the interpolation process through a bidirectional RNN structure. Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts in dynamic video frames are adaptively captured and aggregated to enhance SR reconstruction. Experimental results on several datasets demonstrate that our approach outperforms state-of-the-art STVSR methods. The code is available at https://github.com/littlewhitesea/STDAN.
翻译:空时视频超分辨率(STVSR)的目标是提高低分辨率和低框架率的空间时空分辨率(LR)视频。基于深层学习的近期方法已经取得了显著的改进,但其中多数只是使用两个相邻框架,即短期功能,以综合缺失的框架嵌入,而这种嵌入无法充分探索连续输入LRNN结构的信息流动。此外,现有的STVSR模型几乎没有利用时间环境明确协助高分辨率框架重建。为了解决这些问题,我们在本文件中提议建立一个可变化的关注网络,称为STDAN为STVSR。首先,我们设计了一个长期短期地物间图模块,能够通过双向式RNN结构,从更相邻的插入框中挖掘出丰富的内容。第二,我们推出了一个空间时空可变特征汇总模块,在这个模块中,动态的视频框架中的时空环境正在适应性地采集和汇总,以加强SR的重建。 实验性STVS-SDFA/SDRS-SDRS-FS-SDR-SDR-SDR-SDR-SDSDR-SDR-SDRFSDSDS-S-S-SDR-SDR-SDR-SDSDR-SDSDR-SDSDSDR-SDM-SDSDM-S-S-S-S-SDR-SDSDSDM-S-S-SDM-SDM-SDR-SDR-SDR-SDR-SDR-SDM-SDM-S-SDM-SDSDSDS-S-S-S-S-S-S-S-SDS-S-S-S-S-S-SDM-SDR-SDM-SDR-SDR-SDSDSDR-SDR-SDR-SDM-S-SDM-SDM-SDM-S-S-S-S-S-S-S-S-S-S-SDM-SDM-S-S-S-SDM-SDF-SD-S