We consider the challenging task of training models for image-to-video deblurring, which aims to recover a sequence of sharp images corresponding to a given blurry image input. A critical issue disturbing the training of an image-to-video model is the ambiguity of the frame ordering since both the forward and backward sequences are plausible solutions. This paper proposes an effective self-supervised ordering scheme that allows training high-quality image-to-video deblurring models. Unlike previous methods that rely on order-invariant losses, we assign an explicit order for each video sequence, thus avoiding the order-ambiguity issue. Specifically, we map each video sequence to a vector in a latent high-dimensional space so that there exists a hyperplane such that for every video sequence, the vectors extracted from it and its reversed sequence are on different sides of the hyperplane. The side of the vectors will be used to define the order of the corresponding sequence. Last but not least, we propose a real-image dataset for the image-to-video deblurring problem that covers a variety of popular domains, including face, hand, and street. Extensive experimental results confirm the effectiveness of our method. Code and data are available at https://github.com/VinAIResearch/HyperCUT.git
翻译:我们考虑训练图像到视频去模糊模型的具有挑战性的任务,该模型旨在恢复与给定模糊图像输入相对应的一系列清晰图像。扰乱图像到视频模型训练的一个关键问题是帧排序的歧义性,因为正向和反向序列都是可行的解决方案。本文提出了一种有效的自监督排序方案,允许训练高质量的图像到视频去模糊模型。与先前依赖于无序损失的方法不同,我们为每个视频序列分配一个明确的顺序,从而避免了序列歧义问题。具体而言,我们将每个视频序列映射到一个潜在的高维空间向量中,使得存在一个超平面,使得对于每个视频序列,从其中提取的向量和其反向序列的向量位于超平面的不同侧面。向量的一侧将用于定义相应序列的顺序。最后,我们为图像到视频去模糊问题提出了一个真实图像数据集,涵盖了各种流行领域,包括人脸、手和街道。广泛的实验结果证实了我们方法的有效性。代码和数据可在 https://github.com/VinAIResearch/HyperCUT.git 获取。