For better photography, most recent commercial cameras including smartphones have either adopted large-aperture lens to collect more light or used a burst mode to take multiple images within short times. These interesting features lead us to examine depth from focus/defocus. In this work, we present a convolutional neural network-based depth estimation from single focal stacks. Our method differs from relevant state-of-the-art works with three unique features. First, our method allows depth maps to be inferred in an end-to-end manner even with image alignment. Second, we propose a sharp region detection module to reduce blur ambiguities in subtle focus changes and weakly texture-less regions. Third, we design an effective downsampling module to ease flows of focal information in feature extractions. In addition, for the generalization of the proposed network, we develop a simulator to realistically reproduce the features of commercial cameras, such as changes in field of view, focal length and principal points. By effectively incorporating these three unique features, our network achieves the top rank in the DDFF 12-Scene benchmark on most metrics. We also demonstrate the effectiveness of the proposed method on various quantitative evaluations and real-world images taken from various off-the-shelf cameras compared with state-of-the-art methods. Our source code is publicly available at https://github.com/wcy199705/DfFintheWild.
翻译:为了改进摄影工作,包括智能手机在内的最新商业照相机要么采用大孔径镜来收集更多的光亮,要么在短短的时间内使用爆破模式来收集多种图像。这些有趣的特征引导我们从焦点/偏焦度中考察深度。在这项工作中,我们展示了单一焦距堆的基于神经网络的共进深度估计。我们的方法不同于相关的最新工艺,有三个独特的特征。首先,我们的方法允许以最终到终端的方式推断深度地图,即使图像对齐也是如此。第二,我们提议了一个尖锐的区域探测模块,以减少微妙焦点变化和微弱无纹区域中模糊的模糊之处。第三,我们设计了一个有效的下游取样模块,以方便特征提取中的焦点信息流动。此外,为了对拟议的网络进行概括化,我们开发了一个模拟器,以现实地复制商业摄影机的特征,例如视野、焦点长度和主要点的变化。通过有效地结合这三个独特的特征,我们的网络在大多数计量基准点上的DDFF 12-Scen基准中达到了最高等级。第三,我们设计了一个有效的下下摄模模模模模模模模模模模模模模模模型,我们从现有的各种定量源源/网络上比较了现有的方法。