With the emergence of deep learning in the last years, new opportunities arose in Earth observation research. Nevertheless, they also brought with them new challenges. The data-hungry training processes of deep learning models demand large, resource expensive, annotated data sets and partly replaced knowledge-driven approaches so that model behaviour and the final prediction process became a black box. The proposed SyntEO approach enables Earth observation researchers to automatically generate large deep learning ready data sets by merging existing and procedural data. SyntEO does this by including expert knowledge in the data generation process in a highly structured manner to control the automatic image and label generation by employing an ontology. In this way, fully controllable experiment environments are set up, which support insights in the model training on the synthetic data sets. Thus, SyntEO makes the learning process approachable, which is an important cornerstone for explainable machine learning. We demonstrate the SyntEO approach by predicting offshore wind farms in Sentinel-1 images on two of the worlds largest offshore wind energy production sites. The largest generated data set has 90,000 training examples. A basic convolutional neural network for object detection, that is only trained on this synthetic data, confidently detects offshore wind farms by minimising false detections in challenging environments. In addition, four sequential data sets are generated, demonstrating how the SyntEO approach can precisely define the data set structure and influence the training process. SyntEO is thus a hybrid approach that creates an interface between expert knowledge and data-driven image analysis.
翻译:随着过去几年深层学习的出现,地球观测研究中出现了新的机会,然而,这些机会也带来了新的挑战。深深层学习模型的数据饥饿培训过程需要大量、资源昂贵、附加说明的数据集和部分取代的知识驱动方法,以便模型行为和最后预测过程成为黑盒。拟议的SyntEO方法使地球观测研究人员能够通过将现有和程序数据合并,自动生成大量深层的随时可用的数据集。SyntEO这样做的方式是,在数据生成过程中以高度结构化的方式将专家知识纳入数据生成过程,通过使用本体学来控制自动图像和标签生成。这样,就设置了完全可控制的实验环境,从而支持对合成数据集模型培训的深入了解。因此,SyntEO方法使地球观测研究人员能够通过将Sentinel-1图像中两个世界最大的离岸风能源生产地点的离岸风场进行预测,从而显示SytEO方法。 最大的生成数据集创造了90个培训范例。一个用于检测对象的基本的神经神经网络网络,从而支持合成数据集的诊断,因此只能通过这一具有挑战性的合成结构的连续数据采集数据采集数据,从而确定一个稳定的海上数据结构。