Space-time video super-resolution (STVSR) aims to increase the spatial and temporal resolutions of low-resolution and low-frame-rate videos. Recently, deformable convolution based methods have achieved promising STVSR performance, but they could only infer the intermediate frame pre-defined in the training stage. Besides, these methods undervalued the short-term motion cues among adjacent frames. In this paper, we propose a Temporal Modulation Network (TMNet) to interpolate arbitrary intermediate frame(s) with accurate high-resolution reconstruction. Specifically, we propose a Temporal Modulation Block (TMB) to modulate deformable convolution kernels for controllable feature interpolation. To well exploit the temporal information, we propose a Locally-temporal Feature Comparison (LFC) module, along with the Bi-directional Deformable ConvLSTM, to extract short-term and long-term motion cues in videos. Experiments on three benchmark datasets demonstrate that our TMNet outperforms previous STVSR methods. The code is available at https://github.com/CS-GangXu/TMNet.
翻译:空间时超分辨率视频(STVSR)旨在增加低分辨率和低框架率视频的空间和时间分辨率。最近,基于变形的变形变异法系方法取得了有希望的STVSR性能,但只能推断在培训阶段预先界定的中间框架;此外,这些方法低估了相邻框架之间的短期运动提示。在本文中,我们提议建立一个时空移动网络(TMNet),将具有准确高分辨率重建的任意中间框架进行内插。具体地说,我们提议一个调制可变变变变变变变变变变变变变变的组合块(TMB),以调节可控的特征内圈。为了充分利用时间信息,我们提议了一个局部时空比较模块,连同双向变换的ConLSTM模块,以提取视频中的短期和长期运动提示。对三个基准数据集的实验表明,我们的TMNet的变形变形变形变异了先前的STVTMSR方法。该代码可在 https://giusubutu/Gong-CSS。