Depth estimation is a long-lasting yet important task in computer vision. Most of the previous works try to estimate depth from input images and assume images are all-in-focus (AiF), which is less common in real-world applications. On the other hand, a few works take defocus blur into account and consider it as another cue for depth estimation. In this paper, we propose a method to estimate not only a depth map but an AiF image from a set of images with different focus positions (known as a focal stack). We design a shared architecture to exploit the relationship between depth and AiF estimation. As a result, the proposed method can be trained either supervisedly with ground truth depth, or \emph{unsupervisedly} with AiF images as supervisory signals. We show in various experiments that our method outperforms the state-of-the-art methods both quantitatively and qualitatively, and also has higher efficiency in inference time.
翻译:深度估算是计算机视觉中一项长期而重要的任务。 大多数先前的作品试图从输入图像中估算深度, 并假设图像是全聚焦的( AiF), 这在现实世界应用中并不常见。 另一方面, 少数作品将脱焦模糊考虑, 并将其视为另一个深度估算提示。 在本文中, 我们提出了一个方法, 不仅从一组具有不同焦点位置的图像中估算深度地图, 也从一组 AiF 图像中估算一个 AiF 图像( 称为焦点堆) 。 我们设计了一个共同架构, 以利用深度与 AiF 估计之间的关系。 因此, 所拟议的方法可以在监督下以地面真相深度或\emph{unurvisedly} 来培训, 将 AiF 图像作为监督信号。 我们在各种实验中显示, 我们的方法在定量和定性上都超越了最先进的方法, 并且具有更高的推断时间效率 。