This paper studies the challenging problem of recovering motion from blur, also known as joint deblurring and interpolation or blur temporal super-resolution. The remaining challenges are twofold: 1) the current methods still leave considerable room for improvement in terms of visual quality even on the synthetic dataset, and 2) poor generalization to real-world data. To this end, we propose a blur interpolation transformer (BiT) to effectively unravel the underlying temporal correlation encoded in blur. Based on multi-scale residual Swin transformer blocks, we introduce dual-end temporal supervision and temporally symmetric ensembling strategies to generate effective features for time-varying motion rendering. In addition, we design a hybrid camera system to collect the first real-world dataset of one-to-many blur-sharp video pairs. Experimental results show that BiT has a significant gain over the state-of-the-art methods on the public dataset Adobe240. Besides, the proposed real-world dataset effectively helps the model generalize well to real blurry scenarios.
翻译:本文研究了从模糊(又称“联合分流和内插”或模糊的时间超分辨率)中恢复运动的棘手问题。 剩下的挑战是双重的:1)目前的方法即使在合成数据集中,在视觉质量方面仍有相当大的改进余地,2)对真实世界数据的概括化不力。为此,我们建议使用模糊的内插变压器(BIT),以有效解开模糊编码的内在时间相关性。基于多尺度的残余Swin变压器块,我们引入双端时间监督和时间对称组合策略,以产生实时变换运动的有效功能。此外,我们设计了一个混合相机系统,收集一对一对一对一对一对一对一对一对一对一对一对一对一对一对一张一张一张一张一张一张一张一张一张一张的视频数据集。实验结果显示,BiT在公共数据集Adobe240上的最新技术方法上有很大的优势。此外,拟议的真实世界数据集有效地帮助模型的概括化到真实的模糊情景。