Merging satellite products and ground-based measurements is often required for obtaining precipitation datasets that simultaneously cover large regions with high density and are more accurate than pure satellite precipitation products. Machine and statistical learning regression algorithms are regularly utilized in this endeavour. At the same time, tree-based ensemble algorithms for regression are adopted in various fields for solving algorithmic problems with high accuracy and low computational cost. The latter can constitute a crucial factor for selecting algorithms for satellite precipitation product correction at the daily and finer time scales, where the size of the datasets is particularly large. Still, information on which tree-based ensemble algorithm to select in such a case for the contiguous United States (US) is missing from the literature. In this work, we conduct an extensive comparison between three tree-based ensemble algorithms, specifically random forests, gradient boosting machines (gbm) and extreme gradient boosting (XGBoost), in the context of interest. We use daily data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) and the IMERG (Integrated Multi-satellitE Retrievals for GPM) gridded datasets. We also use earth-observed precipitation data from the Global Historical Climatology Network daily (GHCNd) database. The experiments refer to the entire contiguous US and additionally include the application of the linear regression algorithm for benchmarking purposes. The results suggest that XGBoost is the best-performing tree-based ensemble algorithm among those compared. They also suggest that IMERG is more useful than PERSIANN in the context investigated.
翻译:在这项工作中,经常使用机器和统计学习回归算法。与此同时,在不同领域采用基于树的回归统算法,以高精度和低计算成本解决算法问题。后者可以构成在每日和更细的时间尺度上选择卫星降水产品校正算法的关键因素,因为那里的数据集规模特别大。此外,文献中也缺少在这种案例中为毗连的美国(美国)选择基于树的混合计算法和统计学习回归算法的信息。在这项工作中,我们广泛比较基于树的共算法,特别是随机森林、梯度推动机(gbm)和极端梯度推力(XGBoost),在感兴趣的范围内,后者可以构成选择用于卫星降水产品校校校校校校法的算法的关键因素。我们使用PERSIANNN(在使用人工神经网络进行更精确的遥感信息中,在使用直径直线化数据网络中,在IMERG 数据中,也用IMERG 的直径流计算结果进行广泛比较。