Image editing and compositing have become ubiquitous in entertainment, from digital art to AR and VR experiences. To produce beautiful composites, the camera needs to be geometrically calibrated, which can be tedious and requires a physical calibration target. In place of the traditional multi-images calibration process, we propose to infer the camera calibration parameters such as pitch, roll, field of view, and lens distortion directly from a single image using a deep convolutional neural network. We train this network using automatically generated samples from a large-scale panorama dataset, yielding competitive accuracy in terms of standard l2 error. However, we argue that minimizing such standard error metrics might not be optimal for many applications. In this work, we investigate human sensitivity to inaccuracies in geometric camera calibration. To this end, we conduct a large-scale human perception study where we ask participants to judge the realism of 3D objects composited with correct and biased camera calibration parameters. Based on this study, we develop a new perceptual measure for camera calibration and demonstrate that our deep calibration network outperforms previous single-image based calibration methods both on standard metrics as well as on this novel perceptual measure. Finally, we demonstrate the use of our calibration network for several applications, including virtual object insertion, image retrieval, and compositing. A demonstration of our approach is available at https://lvsn.github.io/deepcalib .
翻译:在娱乐领域,从数字艺术到AR和VR经验,图像编辑和合成已经成为无处不在的娱乐活动,从数字艺术到AR和VR经验。为了制作美丽的合成材料,摄影机需要进行几何校准,这可以是乏味的,需要物理校准目标。在这项工作中,我们建议用深深层神经神经网络从单一图像中推断摄像校准参数,如投影、滚动、视野和透镜扭曲。我们用大型全景数据集自动生成的样本来培训这个网络,从而产生标准的 L2 错误的竞争性准确性。然而,我们认为,将此类标准误差度指标最小化可能不是许多应用的最佳方法。我们调查人类对地理摄影校准校准校准过程中的不准确性。我们为此进行一个大型的人类认识研究,请参与者用正确和偏差的摄像校准参数来判断3D对象的真实性。基于这项研究,我们开发了一个新的摄像校准目标度测量新度测量标准网络,包括精确校准的校准校准校准标准网络,在最后的校准标准网络上展示了我们的标准校准标准校准方法。