Monocular depth estimation and defocus estimation are two fundamental tasks in computer vision. Most existing methods treat depth estimation and defocus estimation as two separate tasks, ignoring the strong connection between them. In this work, we propose a multi-task learning network consisting of an encoder with two decoders to estimate the depth and defocus map from a single focused image. Through the multi-task network, the depth estimation facilitates the defocus estimation to get better results in the weak texture region and the defocus estimation facilitates the depth estimation by the strong physical connection between the two maps. We set up a dataset (named ALL-in-3D dataset) which is the first all-real image dataset consisting of 100K sets of all-in-focus images, focused images with focus depth, depth maps, and defocus maps. It enables the network to learn features and solid physical connections between the depth and real defocus images. Experiments demonstrate that the network learns more solid features from the real focused images than the synthetic focused images. Benefiting from this multi-task structure where different tasks facilitate each other, our depth and defocus estimations achieve significantly better performance than other state-of-art algorithms. The code and dataset will be publicly available at https://github.com/cubhe/MDDNet.
翻译:单心深度估计和去焦点估计是计算机视觉的两项基本任务。 多数现有方法将深度估计和去焦点估计作为两个单独的任务处理, 忽略它们之间的紧密联系 。 在这项工作中, 我们提议建立一个多任务学习网络, 由两个解码器组成, 由两个解码器组成, 用单一焦点图像来估计深度和去焦点地图 。 通过多任务网络, 深度估计有助于减少重点估计, 以便在弱质区域取得更好的结果, 和 脱重点估计有助于通过两个地图之间的紧密物理联系进行深度估计 。 我们设置了一个数据集( 名为 ALL- IN-3D 数据集 ), 这是第一个全真实图像数据集, 由100K 集全焦点图像组成, 集中图像, 以及两个解重点图像 组成 。 它使网络能够学习深度和真正焦点图像之间的特征和牢固的物理联系 。 实验显示, 网络从真正焦点图像中学习的坚实特征比合成焦点图像更加坚实 。 从这个多任务结构中受益,, 我们的不同任务、 深度和脱方向估计 将大大改进 ASUD/DDDD/ 。 在其他状态/DDDA 。