Gridded satellite precipitation datasets are useful in hydrological applications as they cover large regions with high density. However, they are not accurate in the sense that they do not agree with ground-based measurements. An established means for improving their accuracy is to correct them by adopting machine learning algorithms. This correction takes the form of a regression problem, in which the ground-based measurements have the role of the dependent variable and the satellite data are the predictor variables, together with topography factors (e.g., elevation). Most studies of this kind involve a limited number of machine learning algorithms, and are conducted for a small region and for a limited time period. Thus, the results obtained through them are of local importance and do not provide more general guidance and best practices. To provide results that are generalizable and to contribute to the delivery of best practices, we here compare eight state-of-the-art machine learning algorithms in correcting satellite precipitation data for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) gridded dataset, together with monthly earth-observed precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The results suggest that extreme gradient boosting (XGBoost) and random forests are the most accurate in terms of the squared error scoring function. The remaining algorithms can be ordered as follows from the best to the worst: Bayesian regularized feed-forward neural networks, multivariate adaptive polynomial splines (poly-MARS), gradient boosting machines (gbm), multivariate adaptive regression splines (MARS), feed-forward neural networks, linear regression.
翻译:固定的卫星降水量数据集在水文应用中有用,因为它们覆盖了密度高的大区域。 但是,它们并不准确, 因为它与地面测量不尽一致。 提高精确度的既定手段是采用机器学习算法来纠正它们。 校正采取回归问题的形式, 地面测量具有依赖变量的作用, 卫星数据是预测变量, 加上地形因素( 例如, 升) 。 此类研究大多涉及数量有限的机器学习算法, 并且针对一个小区域进行, 且时间有限。 因此, 通过这些算法获得的结果具有当地重要性, 没有提供更一般的指导和最佳做法。 为了提供可概括性的结果, 并且有助于最佳做法的交付, 我们在这里比较了八种最先进的机器学习算法, 以修正整个毗连的美国和15年时期的卫星降水数据。 我们使用的月度数据来自PERSIANN( 预感降数据), 来自一个小区域, 以更精确的遥感信息, 由最精确的固定的轨道网络, 最精确的降水量 Slationalal- madeal comnetal Net 。</s>