Large collections of geo-referenced panoramic images are freely available for cities across the globe, as well as detailed maps with location and meta-data on a great variety of urban objects. They provide a potentially rich source of information on urban objects, but manual annotation for object detection is costly, laborious and difficult. Can we utilize such multimedia sources to automatically annotate street level images as an inexpensive alternative to manual labeling? With the PanorAMS framework we introduce a method to automatically generate bounding box annotations for panoramic images based on urban context information. Following this method, we acquire large-scale, albeit noisy, annotations for an urban dataset solely from open data sources in a fast and automatic manner. The dataset covers the City of Amsterdam and includes over 14 million noisy bounding box annotations of 22 object categories present in 771,299 panoramic images. For many objects further fine-grained information is available, obtained from geospatial meta-data, such as building value, function and average surface area. Such information would have been difficult, if not impossible, to acquire via manual labeling based on the image alone. For detailed evaluation, we introduce an efficient crowdsourcing protocol for bounding box annotations in panoramic images, which we deploy to acquire 147,075 ground-truth object annotations for a subset of 7,348 images, the PanorAMS-clean dataset. For our PanorAMS-noisy dataset, we provide an extensive analysis of the noise and how different types of noise affect image classification and object detection performance. We make both datasets, PanorAMS-noisy and PanorAMS-clean, benchmarks and tools presented in this paper openly available.
翻译:全球各城市均可免费获得大量地理参照全景图像的收藏,以及含有大量城市物体的位置和元数据的详细地图,这些地图提供了潜在的城市物体信息丰富来源,但用于物体探测的人工说明成本高、难度大、难度大。我们能否利用这些多媒体来源自动点注街道一级图像,作为人工标签的廉价替代物?我们利用全景系统框架,采用一种方法,根据城市背景信息自动生成全景图像的捆绑式说明。采用这种方法,我们只能从开放的图像来源以快速和自动的方式获取大型、尽管吵闹的、城市数据集的说明。数据集覆盖阿姆斯特丹市,包括了1 400多万个响亮的框内标注,其中22个物体类别存在于771 299全景图像中。对于许多更精细的物体,我们从地理元数据中获取,例如建筑价值、功能和平均表面区域。根据这一方法,我们很难(如果不是不可能的话)通过手工标定的物体分类获得一个城市数据集。 详细评估A涵盖了阿姆斯特丹市,我们引入了一个高效的地面图解工具。