The development of semi-supervised learning techniques is essential to enhance the generalization capacities of machine learning algorithms. Indeed, raw image data are abundant while labels are scarce, therefore it is crucial to leverage unlabeled inputs to build better models. The availability of large databases have been key for the development of learning algorithms with high level performance. Despite the major role of machine learning in Earth Observation to derive products such as land cover maps, datasets in the field are still limited, either because of modest surface coverage, lack of variety of scenes or restricted classes to identify. We introduce a novel large-scale dataset for semi-supervised semantic segmentation in Earth Observation, the MiniFrance suite. MiniFrance has several unprecedented properties: it is large-scale, containing over 2000 very high resolution aerial images, accounting for more than 200 billions samples (pixels); it is varied, covering 16 conurbations in France, with various climates, different landscapes, and urban as well as countryside scenes; and it is challenging, considering land use classes with high-level semantics. Nevertheless, the most distinctive quality of MiniFrance is being the only dataset in the field especially designed for semi-supervised learning: it contains labeled and unlabeled images in its training partition, which reproduces a life-like scenario. Along with this dataset, we present tools for data representativeness analysis in terms of appearance similarity and a thorough study of MiniFrance data, demonstrating that it is suitable for learning and generalizes well in a semi-supervised setting. Finally, we present semi-supervised deep architectures based on multi-task learning and the first experiments on MiniFrance.
翻译:开发半监督的学习技术对于提高机器学习算法的普遍化能力至关重要。 事实上,原始图像数据是丰富的,而标签却很稀少,因此,利用未贴标签的投入来建立更好的模型至关重要。 大型数据库的可用性是开发学习算法的关键,具有高水平性能。 尽管在地球观测中机器学习在产生土地覆盖图等产品方面起着主要作用,但实地的数据集仍然有限,原因有二,要么是由于地面覆盖不大,缺乏不同的场景或需要识别的有限班级。 我们引入了一个新的大型数据集,用于地球观测、迷你法国套装配的半监督的语义分解。 迷你法国有一些前所未有的特性: 规模庞大, 包含甚高分辨率的航空图像, 计算超过2 000亿个样本(像素); 范围不尽相同, 覆盖法国的16个混凝土, 各种气候、 不同景观和城市以及农村场景; 具有挑战性, 考虑土地使用课程, 高层次的语义。 然而, 最独特的法国小型图像质量, 正在展示一个非常独特的模型化的模型化的模型, 学习实地的模型中, 。