Missing data is a recurrent problem in remote sensing, mainly due to cloud coverage for multispectral images and acquisition problems. This can be a critical issue for crop monitoring, especially for applications relying on machine learning techniques, which generally assume that the feature matrix does not have missing values. This paper proposes a Gaussian Mixture Model (GMM) for the reconstruction of parcel-level features extracted from multispectral images. A robust version of the GMM is also investigated, since datasets can be contaminated by inaccurate samples or features (e.g., wrong crop type reported, inaccurate boundaries, undetected clouds, etc). Additional features extracted from Synthetic Aperture Radar (SAR) images using Sentinel-1 data are also used to provide complementary information and improve the imputations. The robust GMM investigated in this work assigns reduced weights to the outliers during the estimation of the GMM parameters, which improves the final reconstruction. These weights are computed at each step of an Expectation-Maximization (EM) algorithm by using outlier scores provided by the isolation forest algorithm. Experimental validation is conducted on rapeseed and wheat parcels located in the Beauce region (France). Overall, we show that the GMM imputation method outperforms other reconstruction strategies. A mean absolute error (MAE) of 0.013 (resp. 0.019) is obtained for the imputation of the median Normalized Difference Index (NDVI) of the rapeseed (resp. wheat) parcels. Other indicators (e.g., Normalized Difference Water Index) and statistics (for instance the interquartile range, which captures heterogeneity among the parcel indicator) are reconstructed at the same time with good accuracy. In a dataset contaminated by irrelevant samples, using the robust GMM is recommended since the standard GMM imputation can lead to inaccurate imputed values.
翻译:缺少的数据是遥感中反复出现的一个问题,主要原因是多光谱图像的云层覆盖和获取问题。这可能成为作物监测的关键问题,特别是依赖机器学习技术的应用,通常认为特征矩阵没有缺失值。本文建议为重建从多光谱图像中提取的包层特征而采用高斯混合模型(GMM) 。 GMM 的可靠版本也得到调查,因为数据集可能受到不准确的样本或特征的污染(例如,错误的作物类型报告、不准确的边界、未探测的云等)。从Sentinel-1数据的合成不精确雷达(SAR)图像中提取的额外特征也用来提供补充信息,改进估算值。在对GMM参数进行估算期间,对外端的重量进行了降低,从而改进了最后的重建。这些加权可以通过独立森林算法提供的比值(EM)计算出正常货币指数(Oralalalityality Rad) 实验性正常数据(Oralalal disalation3),在GMA 的绝对值中,在Serview Ral Deal Realiferation 中,在Seration Proview中建议了Areal 。