Predictions in the form of probability distributions are crucial for effective decision-making. Quantile regression enables such predictions within spatial prediction settings that aim to create improved precipitation datasets by merging remote sensing and gauge data. However, ensemble learning of quantile regression algorithms remains unexplored in this context and, at the same time, it has not been substantially developed so far in the broader machine learning research landscape. Here, we introduce nine quantile-based ensemble learners and address the afore-mentioned gap in precipitation dataset creation by presenting the first application of these learners to large precipitation datasets. We employed a novel feature engineering strategy, reducing predictors to distance-weighted satellite precipitation at relevant locations, combined with location elevation. Our ensemble learners include six ensemble learning and three simple methods (mean, median, best combiner), combining six individual algorithms: quantile regression (QR), quantile regression forests (QRF), generalized random forests (GRF), gradient boosting machines (GBM), light gradient boosting machines (LightGBM), and quantile regression neural networks (QRNN). These algorithms serve as both base learners and combiners within different ensemble learning methods. We evaluated performance against a reference method (QR) using quantile scoring functions in a large dataset comprising 15 years of monthly gauge-measured and satellite precipitation in the contiguous United States (CONUS). Ensemble learning with QR and QRNN yielded the best results across quantile levels ranging from 0.025 to 0.975, outperforming the reference method by 3.91% to 8.95%. This demonstrates the potential of ensemble learning to improve probabilistic spatial predictions.
翻译:暂无翻译