xgboost的全称是eXtreme Gradient Boosting,它是Gradient Boosting Machine的一个C++实现,并能够自动利用CPU的多线程进行并行,同时在算法上加以改进提高了精度。

VIP内容

这本书提供 访问Spark平台的真实文档和示例,以构建大型企业级机器学习应用程序。

在过去的十年里,机器学习取得了一系列惊人的进步。这些突破正在影响我们的日常生活,并对每个行业产生影响。下一代机器学习Spark提供了Spark和Spark MLlib的介绍,并在标准Spark MLlib库之外,向更强大的第三方机器学习算法和库迈进。在这本书的结尾,你将能够通过许多实际的例子和有洞察力的解释将你的知识应用到现实世界的用例中

  • 介绍机器学习、Spark和Spark MLlib 2.4.x
  • 使用XGBoost4J Spark和LightGBM库在Spark上实现闪电般的快速渐变增强
  • 用Spark的隔离林算法检测异常
  • 使用支持多种语言的Spark NLP和Stanford CoreNLP库
  • 使用Alluxio内存数据加速器for Spark优化ML工作负载
  • 使用GraphX和GraphFrames进行图形分析
  • 利用卷积神经网络进行图像识别
  • 利用Keras框架和Spark分布式深度学习库

这本书是给谁的

数据科学家和机器学习工程师,他们希望将自己的知识提升到一个新的水平,使用Spark和更强大的下一代算法和库,而不是标准Spark MLlib库中提供的;同时也是有抱负的数据科学家和工程师的入门书,他们需要机器学习的入门知识,Spark,SparkMLlib。

成为VIP会员查看完整内容
0
81

最新内容

Employing a large dataset (at most, the order of n = 10^6), this study attempts enhance the literature on the comparison between regression and machine learning (ML)-based rent price prediction models by adding new empirical evidence and considering the spatial dependence of the observations. The regression-based approach incorporates the nearest neighbor Gaussian processes (NNGP) model, enabling the application of kriging to large datasets. In contrast, the ML-based approach utilizes typical models: extreme gradient boosting (XGBoost), random forest (RF), and deep neural network (DNN). The out-of-sample prediction accuracy of these models was compared using Japanese apartment rent data, with a varying order of sample sizes (i.e., n = 10^4, 10^5, 10^6). The results showed that, as the sample size increased, XGBoost and RF outperformed NNGP with higher out-of-sample prediction accuracy. XGBoost achieved the highest prediction accuracy for all sample sizes and error measures in both logarithmic and real scales and for all price bands (when n = 10^5 and 10^6). A comparison of several methods to account for the spatial dependence in RF showed that simply adding spatial coordinates to the explanatory variables may be sufficient.

0
0
下载
预览

最新论文

Employing a large dataset (at most, the order of n = 10^6), this study attempts enhance the literature on the comparison between regression and machine learning (ML)-based rent price prediction models by adding new empirical evidence and considering the spatial dependence of the observations. The regression-based approach incorporates the nearest neighbor Gaussian processes (NNGP) model, enabling the application of kriging to large datasets. In contrast, the ML-based approach utilizes typical models: extreme gradient boosting (XGBoost), random forest (RF), and deep neural network (DNN). The out-of-sample prediction accuracy of these models was compared using Japanese apartment rent data, with a varying order of sample sizes (i.e., n = 10^4, 10^5, 10^6). The results showed that, as the sample size increased, XGBoost and RF outperformed NNGP with higher out-of-sample prediction accuracy. XGBoost achieved the highest prediction accuracy for all sample sizes and error measures in both logarithmic and real scales and for all price bands (when n = 10^5 and 10^6). A comparison of several methods to account for the spatial dependence in RF showed that simply adding spatial coordinates to the explanatory variables may be sufficient.

0
0
下载
预览
Top