Depth-from-defocus (DFD), modeling the relationship between depth and defocus pattern in images, has demonstrated promising performance in depth estimation. Recently, several self-supervised works try to overcome the difficulties in acquiring accurate depth ground-truth. However, they depend on the all-in-focus (AIF) images, which cannot be captured in real-world scenarios. Such limitation discourages the applications of DFD methods. To tackle this issue, we propose a completely self-supervised framework that estimates depth purely from a sparse focal stack. We show that our framework circumvents the needs for the depth and AIF image ground-truth, and receives superior predictions, thus closing the gap between the theoretical success of DFD works and their applications in the real world. In particular, we propose (i) a more realistic setting for DFD tasks, where no depth or AIF image ground-truth is available; (ii) a novel self-supervision framework that provides reliable predictions of depth and AIF image under the challenging setting. The proposed framework uses a neural model to predict the depth and AIF image, and utilizes an optical model to validate and refine the prediction. We verify our framework on three benchmark datasets with rendered focal stacks and real focal stacks. Qualitative and quantitative evaluations show that our method provides a strong baseline for self-supervised DFD tasks.
翻译:深度从散焦(DFD)模型了解了图像中深度和散焦模式之间的关系,在深度估计中表现出了良好的性能。最近,一些自监督的工作尝试克服获取精确深度基准的困难。然而,它们依赖于所有散焦(AIF)图像,这在实际场景中无法捕获。这样的限制阻碍了DFD方法的应用。为了解决这个问题,我们提出了一个完全自监督的框架,纯粹从稀疏散焦堆栈中估计深度。我们展示了我们的框架规避了深度和AIF图像基准的需求,并接收到了优越的预测结果,从而弥补了DFD工作在理论上的成功和在现实世界中的应用之间的差距。特别地,我们提出了(i)一个更为实际的DFD任务设置,在此设置下不存在深度或AIF图像基准;(ii)一种新的自监督框架,在挑战性的设置下提供可靠的深度和AIF图像预测。所提出的框架使用神经模型预测深度和AIF图像,并利用光学模型验证和优化预测。我们在三个基准数据集上验证了我们的框架,包括渲染散焦堆栈和真实散焦堆栈。定性和定量评估表明,我们的方法为自监督DFD任务提供了坚实的基础。