Focus based methods have shown promising results for the task of depth estimation. However, most existing focus based depth estimation approaches depend on maximal sharpness of the focal stack. Out of focus information in the focal stack poses challenges for this task. In this paper, we propose a dynamically multi modal learning strategy which incorporates RGB data and the focal stack in our framework. Our goal is to deeply excavate the spatial correlation in the focal stack by designing the spatial correlation perception module and dynamically fuse multi modal information between RGB data and the focal stack in a adaptive way by designing the multi modal dynamic fusion module. The success of our method is demonstrated by achieving the state of the art performance on two datasets. Furthermore, we test our network on a set of different focused images generated by a smart phone camera to prove that the proposed method not only broke the limitation of only using light field data, but also open a path toward practical applications of depth estimation on common consumer level cameras data.
翻译:以焦点为基础的方法为深度估算任务展示了大有希望的结果。 但是, 大部分现有基于焦点的深度估算方法取决于焦点堆叠的最大锐度。 焦点堆叠中的焦点信息给这项任务带来了挑战。 在本文中, 我们提出了一个动态多模式学习战略, 将 RGB 数据和焦点堆叠纳入我们的框架。 我们的目标是通过设计空间相关感知模块, 以适应的方式将RGB 数据与焦点堆叠之间的多模式信息联合起来, 从而深入挖掘核心堆的空间相关性。 我们的方法的成功表现在两套数据集的艺术性能上。 此外, 我们用智能手机相机生成的一组不同的焦点图像测试我们的网络, 以证明拟议方法不仅突破了光场数据的限制, 而且还打开了对普通消费者水平相机数据进行实际深度估算的途径 。