A central aspect of online decision tree solutions is evaluating the incoming data and enabling model growth. For such, trees much deal with different kinds of input features and partition them to learn from the data. Numerical features are no exception, and they pose additional challenges compared to other kinds of features, as there is no trivial strategy to choose the best point to make a split decision. The problem is even more challenging in regression tasks because both the features and the target are continuous. Typical online solutions evaluate and store all the points monitored between split attempts, which goes against the constraints posed in real-time applications. In this paper, we introduce the Quantization Observer (QO), a simple yet effective hashing-based algorithm to monitor and evaluate split point candidates in numerical features for online tree regressors. QO can be easily integrated into incremental decision trees, such as Hoeffding Trees, and it has a monitoring cost of $O(1)$ per instance and sub-linear cost to evaluate split candidates. Previous solutions had a $O(\log n)$ cost per insertion (in the best case) and a linear cost to evaluate split points. Our extensive experimental setup highlights QO's effectiveness in providing accurate split point suggestions while spending much less memory and processing time than its competitors.
翻译:在线决策树解决方案的一个中心方面是评价收到的数据和促成模型增长。 对于这一点,树木大量处理不同种类的投入特征,并分解它们以从数据中学习。数字特征并非例外,与其他特征相比,它们构成额外的挑战,因为没有简单的战略选择最佳点来作出分裂决定。在回归任务中,问题更为艰巨,因为特征和目标都是连续的。典型的在线解决方案评估和储存了所有被监测的不同尝试之间的点,这与实时应用程序中的限制相对应。在本文中,我们引入了量化观察器(QO),这是一种简单而有效的基于集成的算法,用以监测和评价在线树回归者数字特征中的分点候选人。QO很容易被纳入增量的决策树,因为其监测成本为每例1美元,而亚线成本用于评估分裂的候选人。先前的解决方案每个插入成本为1美元(在最佳情况下),以及一个线性成本用于评估分解点。我们广泛的实验性竞争者在提供准确的时间点方面设定了成本。