Online learning algorithms have become a ubiquitous tool in the machine learning toolbox and are frequently used in small, resource-constraint environments. Among the most successful online learning methods are Decision Tree (DT) ensembles. DT ensembles provide excellent performance while adapting to changes in the data, but they are not resource efficient. Incremental tree learners keep adding new nodes to the tree but never remove old ones increasing the memory consumption over time. Gradient-based tree learning, on the other hand, requires the computation of gradients over the entire tree which is costly for even moderately sized trees. In this paper, we propose a novel memory-efficient online classification ensemble called shrub ensembles for resource-constraint systems. Our algorithm trains small to medium-sized decision trees on small windows and uses stochastic proximal gradient descent to learn the ensemble weights of these `shrubs'. We provide a theoretical analysis of our algorithm and include an extensive discussion on the behavior of our approach in the online setting. In a series of 2~959 experiments on 12 different datasets, we compare our method against 8 state-of-the-art methods. Our Shrub Ensembles retain an excellent performance even when only little memory is available. We show that SE offers a better accuracy-memory trade-off in 7 of 12 cases, while having a statistically significant better performance than most other methods. Our implementation is available under https://github.com/sbuschjaeger/se-online .
翻译:在线学习算法已成为机器学习工具箱中一个无处不在的工具,经常用于小型的资源约束环境。最成功的在线学习方法包括“决定树”(DT)组合。DT 组合在适应数据变化的同时提供出色的性能,但不具备资源效率。增殖树学习者不断在树上添加新的节点,但从未删除老式的节点,增加记忆消耗时间。而基于渐进的树学习则需要计算整个树的梯度,这对中等规模的树木来说代价也很高。在本文中,我们提出一个新的记忆效率高的在线分类套件,称为资源约束系统灌木套。我们的算法在小型窗口上培养中小型决定树,使用随机精密的精度梯度来学习这些“灌木”的精度。我们从理论上分析了我们的算法,并在网上设置中广泛讨论了我们的方法。在12种不同的统计方法中,我们比我们现有最精细的SREBS-CS-SOBS-SOBS-BSLA 实验中,我们比其他的精确性方法要比我们更精细的12种方法。我们更精细的精确地展示了12种方法。我们用在12种方法中比较了我们现有的精确的精确的成绩。我们只有12种方法,我们比我们比其他方法。