We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given $\epsilon > 0$ our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive $\epsilon$ of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of $O\big(\frac{d \log^3 n}{\epsilon^2}\big)$, which improves to $O\big(\frac{d \log^2 n}{\epsilon}\big)$ for binary or categorical features, while it uses space $O(n d)$, where $n$ is the maximum number of examples at any point in time and $d$ is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees uses amortized running time $\Omega(d)$ and space $\tilde{\Omega} (n d)$. We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.
翻译:我们开发了第一个完全动态的算法, 在任意的插入和删除标签示例序列上维持决策树。 鉴于 $\ epsilon > 0$, 我们的算法保证, 每一个决定树的节点在每一个时间点上都与基尼的增益在一个添加值$\ epslon美元中进行分割。 对于实际价值的特性, 算法在每次插入/删除( afrac{d\log} 3 n- epsilon ⁇ 2 ⁇ big) $的分解运行时间为O\ beg (\ frac{ d\ log} 2 nepsilon ⁇ big) $, 用于二进制或绝对特征, 而它使用空间 $( n d) $, 美元是任何时间点上的最大例子, 美元是特性的数量。 我们的算法几乎是最佳的, 因为任何具有类似保证的算法都使用调时 $\ 和空间 $\\\ lapple $\\\ spilde rial devalal at dalalal dalalalalalalalalal dal dal dald 显示我们的真实数据。