Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography. Code and data are available at www.learn2refocus.github.io
翻译:对焦是摄影的基石,然而自动对焦系统常无法准确捕捉目标主体,用户也常希望在拍摄后调整焦点。我们提出一种利用视频扩散模型实现逼真后捕获重聚焦的新方法。该方法仅需单张失焦图像,即可生成感知准确的焦堆栈(以视频序列形式表示),从而实现交互式重聚焦并解锁一系列下游应用。为支持本工作及未来研究,我们发布了在多样化真实世界智能手机条件下采集的大规模焦堆栈数据集。在具有挑战性的场景中,我们的方法在感知质量与鲁棒性方面均持续优于现有方法,为日常摄影中更先进的对焦编辑能力开辟了道路。代码与数据详见 www.learn2refocus.github.io