Objects moving at high speed appear significantly blurred when captured with cameras. The blurry appearance is especially ambiguous when the object has complex shape or texture. In such cases, classical methods, or even humans, are unable to recover the object's appearance and motion. We propose a method that, given a single image with its estimated background, outputs the object's appearance and position in a series of sub-frames as if captured by a high-speed camera (i.e. temporal super-resolution). The proposed generative model embeds an image of the blurred object into a latent space representation, disentangles the background, and renders the sharp appearance. Inspired by the image formation model, we design novel self-supervised loss function terms that boost performance and show good generalization capabilities. The proposed DeFMO method is trained on a complex synthetic dataset, yet it performs well on real-world data from several datasets. DeFMO outperforms the state of the art and generates high-quality temporal super-resolution frames.
翻译:当用相机捕获时,高速移动的物体似乎明显模糊。当物体有复杂的形状或纹理时,模糊的外观特别模糊。在这种情况下,古典方法,甚至人类,无法恢复物体的外观和运动。我们提出一种方法,根据一个带有估计背景的单一图像,在一系列子框中输出物体的外观和位置,仿佛被高速相机(即时间超分辨率)所捕获。拟议的基因模型将模糊物体的图像嵌入一个潜在的空间代表,使背景分解,并造成清晰的外观。在图像形成模型的启发下,我们设计了新的自我监督损失功能术语,以提升性能并展示良好的概括性能力。拟议的DeFMO方法在复杂的合成数据集上受过训练,但在许多数据集中,它却在真实世界数据上表现良好。DeFMO超越了艺术的状态,并产生了高品质的时空超分辨率框架。