合并网网卫星和地球观测降水数据的机器学习算法比较</s> (Comparison of machine learning algorithms for merging gridded satellite and earth-observed precipitation data)

Gridded satellite precipitation datasets are useful in hydrological applications as they cover large regions with high density. However, they are not accurate in the sense that they do not agree with ground-based measurements. An established means for improving their accuracy is to correct them by adopting machine learning algorithms. This correction takes the form of a regression problem, in which the ground-based measurements have the role of the dependent variable and the satellite data are the predictor variables, together with topography factors (e.g., elevation). Most studies of this kind involve a limited number of machine learning algorithms, and are conducted for a small region and for a limited time period. Thus, the results obtained through them are of local importance and do not provide more general guidance and best practices. To provide results that are generalizable and to contribute to the delivery of best practices, we here compare eight state-of-the-art machine learning algorithms in correcting satellite precipitation data for the entire contiguous United States and for a 15-year period. We use monthly data from the PERSIANN (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks) gridded dataset, together with monthly earth-observed precipitation data from the Global Historical Climatology Network monthly database, version 2 (GHCNm). The results suggest that extreme gradient boosting (XGBoost) and random forests are the most accurate in terms of the squared error scoring function. The remaining algorithms can be ordered as follows from the best to the worst: Bayesian regularized feed-forward neural networks, multivariate adaptive polynomial splines (poly-MARS), gradient boosting machines (gbm), multivariate adaptive regression splines (MARS), feed-forward neural networks, linear regression.

翻译：固定的卫星降水量数据集在水文应用中有用,因为它们覆盖了密度高的大区域。但是,它们并不准确, 因为它与地面测量不尽一致。提高精确度的既定手段是采用机器学习算法来纠正它们。校正采取回归问题的形式, 地面测量具有依赖变量的作用, 卫星数据是预测变量, 加上地形因素( 例如, 升) 。此类研究大多涉及数量有限的机器学习算法, 并且针对一个小区域进行, 且时间有限。因此, 通过这些算法获得的结果具有当地重要性, 没有提供更一般的指导和最佳做法。为了提供可概括性的结果, 并且有助于最佳做法的交付, 我们在这里比较了八种最先进的机器学习算法, 以修正整个毗连的美国和15年时期的卫星降水数据。我们使用的月度数据来自PERSIANN( 预感降数据), 来自一个小区域, 以更精确的遥感信息, 由最精确的固定的轨道网络, 最精确的降水量 Slationalal- madeal comnetal Net 。</s>

相关内容

Machine Learning

关注 2241

机器学习（Machine Learning）是一个研究计算学习方法的国际论坛。该杂志发表文章，报告广泛的学习方法应用于各种学习问题的实质性结果。该杂志的特色论文描述研究的问题和方法，应用研究和研究方法的问题。有关学习问题或方法的论文通过实证研究、理论分析或与心理现象的比较提供了坚实的支持。应用论文展示了如何应用学习方法来解决重要的应用问题。研究方法论文改进了机器学习的研究方法。所有的论文都以其他研究人员可以验证或复制的方式描述了支持证据。论文还详细说明了学习的组成部分，并讨论了关于知识表示和性能任务的假设。官网地址：http://dblp.uni-trier.de/db/journals/ml/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日