Decision trees are highly famous in machine learning and usually acquire state-of-the-art performance. Despite that, well-known variants like CART, ID3, random forest, and boosted trees miss a probabilistic version that encodes prior assumptions about tree structures and shares statistical strength between node parameters. Existing work on Bayesian decision trees depend on Markov Chain Monte Carlo (MCMC), which can be computationally slow, especially on high dimensional data and expensive proposals. In this study, we propose a method to parallelise a single MCMC decision tree chain on an average laptop or personal computer that enables us to reduce its run-time through multi-core processing while the results are statistically identical to conventional sequential implementation. We also calculate the theoretical and practical reduction in run time, which can be obtained utilising our method on multi-processor architectures. Experiments showed that we could achieve 18 times faster running time provided that the serial and the parallel implementation are statistically identical.
翻译:在机器学习中,决策树非常出名,通常获得最先进的性能。尽管如此,众所周知的变种,如CART、ID3、ID3、随机森林和增殖树都失去了一个概率化的版本,该版本编码了以前对树结构的假设,并在节点参数之间分享了统计力量。巴伊西亚决策树的现有工作取决于Markov链蒙特卡洛(MCMC),这在计算上可能缓慢,特别是在高维数据和昂贵的提议方面。在本研究中,我们提出了一个方法,在平均笔记本电脑或个人计算机上平行使用一个MCMCM决策树链,使我们能够通过多核心处理缩短运行时间,而结果在统计上与传统的顺序执行完全相同。我们还计算了运行时间的理论和实际缩减,可以将我们的方法用于多处理器结构。实验表明,只要序列和平行执行在统计上相同,我们就可以更快地达到18倍的运行时间。