Analyzing the planet at scale with satellite imagery and machine learning is a dream that has been constantly hindered by the cost of difficult-to-access highly-representative high-resolution imagery. To remediate this, we introduce here the WorldStrat dataset. The largest and most varied such publicly available dataset, at Airbus SPOT 6/7 satellites' high resolution of up to 1.5 m/pixel, empowered by European Space Agency's Phi-Lab as part of the ESA-funded QueryPlanet project, we curate nearly 10,000 sqkm of unique locations to ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities. We also enrich those with locations typically under-represented in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk. We temporally-match each high-resolution image with multiple low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites at 10 m/pixel. We accompany this dataset with an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox. We hereby hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop from free public low-resolution Sentinel2 imagery the same power of analysis allowed by costly private high-resolution imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution. High-resolution Airbus imagery is CC BY-NC, while the labels and Sentinel2 imagery are CC BY, and the source code and pre-trained models under BSD. The dataset is available at https://zenodo.org/record/6810792 and the software package at https://github.com/worldstrat/worldstrat .
翻译:用卫星图像和机器学习对行星进行比例分析是一个梦想,它一直受到难以获取的高代表性高分辨率图像的成本的阻碍。为了补救这一梦想,我们在此推出世界系统数据集。在Airbus SPOT 6/7卫星高分辨率中,最大和最多样的公开数据集,最高分辨率为1.5 m/pixel,由欧洲航天局Phi-Lab授权,作为欧空局资助的QueryPlanet项目的一部分,我们翻譯了近10 000平方公里的独特地点,以确保世界各地所有类型土地使用的分层代表:从农业到冰盖,从森林到多个城市化的密度。我们还丰富了ML数据集中通常代表不足的地点:人道主义利益地点、非法采矿地点和风险人群的居住点。我们通过自由获取的低分辨率、低分辨率Sentinel-2分辨率卫星10 m/pixel。我们伴随这一数据集,从一个开放的离子数据基离子、高清晰度的离子服务器2,从这个离子智能的离子智能的离子服务器和高容量的离子数据分析,从这个离子的离子的离子的离子机的离心机的离心机的离心机数据到所有的离心机的离心机的离子和离心机的离心机的离心机的离心机的离心机的离心机的离心机的离心机数据。