Depth prediction is fundamental for many useful applications on computer vision and robotic systems. On mobile phones, the performance of some useful applications such as augmented reality, autofocus and so on could be enhanced by accurate depth prediction. In this work, an efficient fully convolutional network architecture for depth prediction has been proposed, which uses RegNetY 06 as the encoder and split-concatenate shuffle blocks as decoder. At the same time, an appropriate combination of data augmentation, hyper-parameters and loss functions to efficiently train the lightweight network has been provided. Also, an Android application has been developed which can load CNN models to predict depth map by the monocular images captured from the mobile camera and evaluate the average latency and frame per second of the models. As a result, the network achieves 82.7% {\delta}1 accuracy on NYU Depth v2 dataset and at the same time, have only 62ms latency on ARM A76 CPUs so that it can predict the depth map from the mobile camera in real-time.
翻译:深度预测是计算机视觉和机器人系统许多有用应用的基础。 在移动电话上,一些有用应用的性能,如增强现实、自动聚焦等,可以通过准确深度预测得到提高。在这项工作中,提出了一个高效的全演化深度预测网络结构,将RegNetY 06作为编码器和分离式散射区块作为解码器。同时,提供了数据增强、超参数和损失功能的适当组合,以有效培训轻量网络。此外,还开发了一个安纳罗式应用,可以装载CNN模型,用从移动相机拍摄的单眼图像来预测深度地图的深度,并评估每秒模型的平均悬浮度和框架。因此,网络在NYU深度 v2数据集上实现了82.7% delta}1 的精确度,同时,在ARM A76 CPUs上只有62米的悬浮度,以便能够实时从移动相机上预测深度地图。