分散的软性贝耶斯叠加后退树 (Distributed Soft Bayesian Additive Regression Trees) - 专知论文

会员服务 ·

0

BART · 统计量 · SOFT · Performer · 随机森林 ·

2021 年 8 月 26 日

Distributed Soft Bayesian Additive Regression Trees

翻译：分散的软性贝耶斯叠加后退树

Hao Ran,Yang Bai

Bayesian Additive Regression Trees(BART) is a Bayesian nonparametric approach which has been shown to be competitive with the best modern predictive methods such as random forest and Gradient Boosting Decision Tree.The sum of trees structure combined with a Bayesian inferential framework provide a accurate and robust statistic method.BART variant named SBART using randomized decision trees has been developed and show practical benefits compared to BART. The primary bottleneck of SBART is the speed to compute the sufficient statistics and the publicly avaiable implementation of the SBART algorithm in the R package is very slow.In this paper we show how the SBART algorithm can be modified and computed using single program,multiple data(SPMD) distributed computation with the Message Passing Interface(MPI) library.This approach scales nearly linearly in the number of processor cores, enabling the practitioner to perform statistical inference on massive datasets. Our approach can also handle datasets too massive to fit on any single data repository.We have made modification to this algorithm to make it capable to handle classfication problem which can not be done with the original R package.With data experiments we show the advantage of distributed SBART for classfication problem compared to BART.

翻译：Bayesian Additive Regression 树(BART)是一种巴伊西亚非参数性的非参数性方法,已证明与随机森林和梯级推动决定树等最佳现代预测方法具有竞争力。树木结构加贝巴伊西亚推论框架提供了准确而有力的统计方法。已经开发出名为SBART(使用随机决定树的SBART)的变量, 并显示与BART(BART)相比的实际效益。 SBART的主要瓶颈是计算足够统计数据的速度, R 软件包中SBART算法的公开实施速度非常缓慢。在本文中,我们展示了如何用单一程序修改和计算SBART算法的算法。多PROD(SPMD) 与B 信息传输界面库的分布计算方法提供了准确而有力的数据方法。这个方法几乎线性地在处理大量数据集的统计推算法上, 从业人员也可以处理过于庞大的数据设置,无法适应任何单一数据存储器。我们对这一算法进行了修改, 使其能够处理原始的SARTART优势与Sfrical 问题进行比较, 问题。

0

相关内容

BART

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【干货书】《机器学习导论(第二版)》，348页pdf

【干货书】《机器学习导论(第二版)》，348页pdf

专知会员服务

249+阅读 · 2020年6月16日

【硬核书】可扩展机器学习：并行分布式方法

【硬核书】可扩展机器学习：并行分布式方法

专知会员服务

86+阅读 · 2020年5月23日

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

170+阅读 · 2020年5月10日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Graph Wedgelets: Adaptive Data Compression on Graphs based on Binary Wedge Partitioning Trees and Geometric Wavelets

Arxiv

0+阅读 · 2021年10月21日

Testing for long-range dependence in non-stationary time series time-varying regression

Arxiv

0+阅读 · 2021年10月20日

BNPdensity: Bayesian nonparametric mixture modeling in R

Arxiv

0+阅读 · 2021年10月20日

Robustness against conflicting prior information in regression

Arxiv

0+阅读 · 2021年10月18日

Efficient Gaussian Neural Processes for Regression

Arxiv

0+阅读 · 2021年10月18日

Truncating the Exponential with a Uniform Distribution

Arxiv

0+阅读 · 2021年10月18日

Optimal Decision Trees for Nonlinear Metrics

Arxiv

0+阅读 · 2021年10月15日

A Distribution-Free Independence Test for High Dimension Data

Arxiv

0+阅读 · 2021年10月14日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Arxiv

11+阅读 · 2020年2月18日

Latent nested nonparametric priors

Arxiv

4+阅读 · 2018年1月15日

VIP会员

文章信息

相关主题

相关VIP内容

【经典书】机器学习黑客秘笈(Machine Learning for Hackers)，322页pdf

专知会员服务

46+阅读 · 2021年2月8日

【干货书】机器学习速查手册，135页pdf

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

80+阅读 · 2020年7月26日

【干货书】《机器学习导论(第二版)》，348页pdf

【干货书】《机器学习导论(第二版)》，348页pdf

专知会员服务

249+阅读 · 2020年6月16日

【硬核书】可扩展机器学习：并行分布式方法

【硬核书】可扩展机器学习：并行分布式方法

专知会员服务

86+阅读 · 2020年5月23日

最新《机器学习最优化》课程笔记，36页pdf，Optimization for Machine Learning

专知会员服务

170+阅读 · 2020年5月10日

Python分布式计算，171页pdf，Distributed Computing with Python

Python分布式计算，171页pdf，Distributed Computing with Python

专知会员服务

108+阅读 · 2020年5月3日

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

【机器学习最优化课程笔记】Optimization for Machine Learning，36页pdf

专知会员服务

117+阅读 · 2020年3月25日

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

【文献综述】分布式机器学习综述论文，33页pdf，A Survey on Distributed Machine Learning

专知会员服务

124+阅读 · 2019年12月23日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《生成式人工智能与大/小语言模型在供应链管理决策优化与可持续性提升中的作用评估》最新51页

白宫发布《赢得AI竞赛：美国人工智能行动计划》最新28页

地下战：地下空间的战略博弈

《美地下作战条令手册》228页

相关资讯

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

Disentangled的假设的探讨

Disentangled的假设的探讨

CreateAMind

9+阅读 · 2018年12月10日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【学习】(Python)SVM数据分类

【学习】(Python)SVM数据分类

机器学习研究会

6+阅读 · 2017年10月15日

Adversarial Variational Bayes: Unifying VAE and GAN 代码

Adversarial Variational Bayes: Unifying VAE and GAN 代码

CreateAMind

7+阅读 · 2017年10月4日

【推荐】决策树/随机森林深入解析

【推荐】决策树/随机森林深入解析

机器学习研究会

5+阅读 · 2017年9月21日

【学习】Hierarchical Softmax

【学习】Hierarchical Softmax

机器学习研究会

4+阅读 · 2017年8月6日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Graph Wedgelets: Adaptive Data Compression on Graphs based on Binary Wedge Partitioning Trees and Geometric Wavelets

Arxiv

0+阅读 · 2021年10月21日

Testing for long-range dependence in non-stationary time series time-varying regression

Arxiv

0+阅读 · 2021年10月20日

BNPdensity: Bayesian nonparametric mixture modeling in R

Arxiv

0+阅读 · 2021年10月20日

Robustness against conflicting prior information in regression

Arxiv

0+阅读 · 2021年10月18日

Efficient Gaussian Neural Processes for Regression

Arxiv

0+阅读 · 2021年10月18日

Truncating the Exponential with a Uniform Distribution

Arxiv

0+阅读 · 2021年10月18日

Optimal Decision Trees for Nonlinear Metrics

Arxiv

0+阅读 · 2021年10月15日

A Distribution-Free Independence Test for High Dimension Data

Arxiv

0+阅读 · 2021年10月14日

Distributed Non-Convex Optimization with Sublinear Speedup under Intermittent Client Availability

Arxiv

11+阅读 · 2020年2月18日

Latent nested nonparametric priors

Arxiv

4+阅读 · 2018年1月15日

微信扫码咨询专知VIP会员