Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method for realistic post-capture refocusing using video diffusion models. From a single defocused image, our approach generates a perceptually accurate focal stack, represented as a video sequence, enabling interactive refocusing and unlocking a range of downstream applications. We release a large-scale focal stack dataset acquired under diverse real-world smartphone conditions to support this work and future research. Our method consistently outperforms existing approaches in both perceptual quality and robustness across challenging scenarios, paving the way for more advanced focus-editing capabilities in everyday photography. Code and data are available at https://learn2refocus.github.io
翻译:对焦是摄影的基石,然而自动对焦系统常常无法捕捉到预期的主体,用户也经常希望在拍摄后调整焦点。我们提出了一种利用视频扩散模型实现逼真后拍摄重聚焦的新方法。该方法从单张失焦图像出发,生成感知上精确的焦堆栈(以视频序列形式表示),从而实现交互式重聚焦并解锁一系列下游应用。为支持本项工作及未来研究,我们发布了一个在多样化真实世界智能手机条件下采集的大规模焦堆栈数据集。在具有挑战性的场景中,我们的方法在感知质量和鲁棒性方面均持续优于现有方法,为日常摄影中更先进的焦点编辑功能开辟了道路。代码与数据可在 https://learn2refocus.github.io 获取。