This report presents design considerations for automatically generating satellite imagery datasets for training machine learning models with emphasis placed on dense classification tasks, e.g. semantic segmentation. The implementation presented makes use of freely available Sentinel-2 data which allows generation of large scale datasets required for training deep neural networks. We discuss issues faced from the point of view of deep neural network training and evaluation such as checking the quality of ground truth data and comment on the scalability of the approach. Accompanying code is provided in https://github.com/michaeltrs/DeepSatData.
翻译:本报告介绍了为培训机器学习模式自动生成卫星图像数据集的设计考虑,重点是密集的分类任务,例如语义分离;实施时利用了免费提供的Sentinel-2数据,从而能够生成培训深神经网络所需的大规模数据集;我们从深神经网络培训和评估的角度讨论了所面临的问题,例如检查地面真相数据的质量,并评论该方法的可扩展性;见https://github.com/michaeltrs/DeepSatData。