In this paper, we investigate the problem of Semantic Segmentation for agricultural aerial imagery. We observe that the existing methods used for this task are designed without considering two characteristics of the aerial data: (i) the top-down perspective implies that the model cannot rely on a fixed semantic structure of the scene, because the same scene may be experienced with different rotations of the sensor; (ii) there can be a strong imbalance in the distribution of semantic classes because the relevant objects of the scene may appear at extremely different scales (e.g., a field of crops and a small vehicle). We propose a solution to these problems based on two ideas: (i) we use together a set of suitable augmentation and a consistency loss to guide the model to learn semantic representations that are invariant to the photometric and geometric shifts typical of the top-down perspective (Augmentation Invariance); (ii) we use a sampling method (Adaptive Sampling) that selects the training images based on a measure of pixel-wise distribution of classes and actual network confidence. With an extensive set of experiments conducted on the Agriculture-Vision dataset, we demonstrate that our proposed strategies improve the performance of the current state-of-the-art method.
翻译:在本文中,我们调查了农业航空图像的语义分解问题,我们注意到,目前用于这项任务的方法的设计没有考虑到航空数据的两个特点:(一) 从上到下的观点意味着模型不能依赖现场固定的语义结构,因为传感器的不同旋转可能会经历同一场景;(二) 语义类分布可能严重不平衡,因为现场的相关物体可能出现在极为不同的尺度上(例如作物田和小型车辆)。 我们基于两个想法提出了解决这些问题的解决办法:(一) 我们共同使用一套适当的增强和一致性损失来指导模型,以学习与自下而上角度典型的光度和几何变化(放大变化性);(二) 我们使用一种抽样方法(Adaptial Sampling),根据对班级和实际网络信任的比等分布的尺度选择培训图像。 我们用一套广泛的实验方法来学习与自上至下角度典型的光度和几何变化不相适应的语义表达方式(缩略图);(二) 我们使用一种抽样方法(Adaptisition Smalling),根据对课程和实际网络信任度的测量方法进行选择培训图像。我们提出的改进了目前观测方法。