Estimating the depth of comics images is challenging as such images a) are monocular; b) lack ground-truth depth annotations; c) differ across different artistic styles; d) are sparse and noisy. We thus, use an off-the-shelf unsupervised image to image translation method to translate the comics images to natural ones and then use an attention-guided monocular depth estimator to predict their depth. This lets us leverage the depth annotations of existing natural images to train the depth estimator. Furthermore, our model learns to distinguish between text and images in the comics panels to reduce text-based artefacts in the depth estimates. Our method consistently outperforms the existing state-ofthe-art approaches across all metrics on both the DCM and eBDtheque images. Finally, we introduce a dataset to evaluate depth prediction on comics.
翻译:估计漫画图像的深度具有挑战性,因为此类图像a)是单眼图像;b)缺乏地面真实深度说明;c)不同艺术风格不同;d)稀少和吵闹。因此,我们使用现成的无监督图像转换方法将漫画图像翻译为自然图像,然后使用关注引导单眼深度估计仪来预测其深度。这使我们能够利用现有自然图像的深度说明来培训深度测量仪。此外,我们的模型学会了在漫画板上区分文本和图像,以减少深度估计中的基于文本的人工制品。我们的方法始终超越了DCM和eBDtheque图像上所有图象的现有最先进的方法。最后,我们引入了一个数据集来评估漫画深度预测。