Modern cameras are equipped with a wide array of sensors that enable recording the geospatial context of an image. Taking advantage of this, we explore depth estimation under the assumption that the camera is geocalibrated, a problem we refer to as geo-enabled depth estimation. Our key insight is that if capture location is known, the corresponding overhead viewpoint offers a valuable resource for understanding the scale of the scene. We propose an end-to-end architecture for depth estimation that uses geospatial context to infer a synthetic ground-level depth map from a co-located overhead image, then fuses it inside of an encoder/decoder style segmentation network. To support evaluation of our methods, we extend a recently released dataset with overhead imagery and corresponding height maps. Results demonstrate that integrating geospatial context significantly reduces error compared to baselines, both at close ranges and when evaluating at much larger distances than existing benchmarks consider.
翻译:现代相机配备了能够记录图像地理空间背景的广泛传感器。 利用这一点, 我们探索深度估计, 假设相机是地理校准的, 我们称之为地理化的深度估计。 我们的关键洞察力是, 如果捕获位置为人所知, 相应的间接角度为了解场景规模提供了宝贵的资源。 我们提议了一个从地理空间角度从共用的地面图像中推导合成地面深度地图的深度估计端到端结构, 然后在编码器/脱coder风格分割网中将其连接起来。 为了支持我们的方法评估, 我们扩展了最近发布的数据集, 包括高空图像和相应的高度地图。 结果显示, 整合地理空间背景会大大减少与基线的误差, 不仅在近距离上, 而且在比现有基准考虑的更远的距离上评估 。