We consider the challenging task of training models for image-to-video deblurring, which aims to recover a sequence of sharp images corresponding to a given blurry image input. A critical issue disturbing the training of an image-to-video model is the ambiguity of the frame ordering since both the forward and backward sequences are plausible solutions. This paper proposes an effective self-supervised ordering scheme that allows training high-quality image-to-video deblurring models. Unlike previous methods that rely on order-invariant losses, we assign an explicit order for each video sequence, thus avoiding the order-ambiguity issue. Specifically, we map each video sequence to a vector in a latent high-dimensional space so that there exists a hyperplane such that for every video sequence, the vectors extracted from it and its reversed sequence are on different sides of the hyperplane. The side of the vectors will be used to define the order of the corresponding sequence. Last but not least, we propose a real-image dataset for the image-to-video deblurring problem that covers a variety of popular domains, including face, hand, and street. Extensive experimental results confirm the effectiveness of our method. Code and data are available at https://github.com/VinAIResearch/HyperCUT.git
翻译:我们考虑训练图像到视频去模糊模型的挑战性任务,它旨在恢复与给定模糊图像输入对应的一系列清晰图像。扰乱图像到视频模型训练的一个关键问题是帧排序的歧义性,因为前向序列和后向序列都是合理的解决方案。本文提出了一种有效的自监督排序方案,允许训练高质量的图像到视频去模糊模型。与先前依赖于顺序不变损失的方法不同,我们给每个视频序列分配了一个明确的顺序,从而避免了顺序模糊性问题。具体来说,我们将每个视频序列映射到一个潜在的高维空间向量,使得存在一个超平面,使得对于每个视频序列,从中提取的向量和其翻转序列的向量在超平面的不同侧面。向量所在的那一侧被用来定义相应序列的顺序。最后,我们针对图像到视频去模糊问题提出了一个真实图像数据集,涵盖了包括面部,手和街道在内的各种受欢迎的领域。大量实验结果证实了我们方法的有效性。代码和数据可在 https://github.com/VinAIResearch/HyperCUT.git 获取。