利用多视图和多日期卫星图像和新闻的OSM培训标签,对大地区地理区域进行语语义标记 (Semantic Labeling of Large-Area Geographic Regions Using Multi-View and Multi-Date Satellite Images and Noisy OSM Training Labels)

from arxiv, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

We present a novel multi-view training framework and CNN architecture for combining information from multiple overlapping satellite images and noisy training labels derived from OpenStreetMap (OSM) to semantically label buildings and roads across large geographic regions (100 km$^2$). Our approach to multi-view semantic segmentation yields a 4-7% improvement in the per-class IoU scores compared to the traditional approaches that use the views independently of one another. A unique (and, perhaps, surprising) property of our system is that modifications that are added to the tail-end of the CNN for learning from the multi-view data can be discarded at the time of inference with a relatively small penalty in the overall performance. This implies that the benefits of training using multiple views are absorbed by all the layers of the network. Additionally, our approach only adds a small overhead in terms of the GPU-memory consumption even when training with as many as 32 views per scene. The system we present is end-to-end automated, which facilitates comparing the classifiers trained directly on true orthophotos vis-a-vis first training them on the off-nadir images and subsequently translating the predicted labels to geographical coordinates. With no human supervision, our IoU scores for the buildings and roads classes are 0.8 and 0.64 respectively which are better than state-of-the-art approaches that use OSM labels and that are not completely automated.

翻译：我们提出了一个新颖的多视角培训框架和CNN架构,将来自OpenStreetMap(OSM)的多重叠卫星图像和从多视角数据中学习多视角数据的大量培训标签(OSM)的信息与大地理区域(100公里美元=2美元)的大地标签建筑物和道路(100公里=2美元)的封条标签和道路(100公里=2美元)的信息结合起来,我们采用多视图语义分割法的方法,使每类IOU分数增加了4-7%的改善,而传统方法则是独立使用不同观点。我们系统的一个独特(或许是令人惊讶的)属性是,在从多视角数据中学习的CNN尾端的修改可以在推断时被丢弃,总体表现的处罚相对较轻。这意味着,使用多种观点进行的培训的好处被网络各层吸收。此外,我们的方法只增加了少量的GPU-模消费的间接成本,即使每次培训有32种不同观点。我们所介绍的系统是端对终端自动化的系统,便于比较直接从真实或phoptoto中学习多视角数据的分类者,在总体业绩中可以被忽略。这意味着,使用的培训的好处是相对相对较小的惩罚相对较少的,使用多种观点的处罚。这意味着培训的好处被吸收的培训的好处是所有网络的优势。使用培训的好处,使用培训的好处被吸收到网络系统,而后,而后在网络系统在网络结构结构上将它们被分别在网络的频率的频率的频率的频率上被分别被分别被分别是比相互协调。的频率的图像的频率的图像的图像的频率的图像的频率的频率的频率的频率的频率的频率的频率的频率的频率的频率的频率的频率的频率的频率的频率的频率的频率的图像的频率的频率的频率的频率是比比比相互比较。