We seek to answer the question: what can a motion-blurred image reveal about a scene's past, present, and future? Although motion blur obscures image details and degrades visual quality, it also encodes information about scene and camera motion during an exposure. Previous techniques leverage this information to estimate a sharp image from an input blurry one, or to predict a sequence of video frames showing what might have occurred at the moment of image capture. However, they rely on handcrafted priors or network architectures to resolve ambiguities in this inverse problem, and do not incorporate image and video priors on large-scale datasets. As such, existing methods struggle to reproduce complex scene dynamics and do not attempt to recover what occurred before or after an image was taken. Here, we introduce a new technique that repurposes a pre-trained video diffusion model trained on internet-scale datasets to recover videos revealing complex scene dynamics during the moment of capture and what might have occurred immediately into the past or future. Our approach is robust and versatile; it outperforms previous methods for this task, generalizes to challenging in-the-wild images, and supports downstream tasks such as recovering camera trajectories, object motion, and dynamic 3D scene structure. Code and data are available at https://blur2vid.github.io
翻译:我们试图回答这样一个问题:一张运动模糊的图像能够揭示场景的过去、现在与未来的哪些信息?尽管运动模糊会掩盖图像细节并降低视觉质量,但它也编码了曝光期间场景与相机运动的信息。现有技术利用这些信息从输入的模糊图像中估计出清晰的图像,或预测一系列视频帧以展示图像拍摄瞬间可能发生的情况。然而,这些方法依赖于手工设计的先验或网络架构来解决这一逆问题中的模糊性,且未结合大规模数据集上的图像与视频先验。因此,现有方法难以复现复杂的场景动态,亦未尝试恢复图像拍摄前后发生的事件。本文提出一种新技术,该方法重新利用在互联网规模数据集上预训练的视频扩散模型,以恢复揭示拍摄瞬间复杂场景动态的视频,并推测紧邻拍摄时刻之前或之后可能发生的情况。我们的方法鲁棒且通用;在该任务上优于先前方法,能够泛化至具有挑战性的真实场景图像,并支持下游任务,如恢复相机轨迹、物体运动及动态三维场景结构。代码与数据可在 https://blur2vid.github.io 获取。