Database research can help machine learning performance in many ways. One way is to design better data structures. This paper combines the use of incremental computation and sequential and probabilistic filtering to enable "forgetful" tree-based learning algorithms to cope with concept drift data (i.e., data whose function from input to classification changes over time). The forgetful algorithms described in this paper achieve high time performance while maintaining high quality predictions on streaming data. Specifically, the algorithms are up to 24 times faster than state-of-the-art incremental algorithms with at most a 2% loss of accuracy, or at least twice faster without any loss of accuracy. This makes such structures suitable for high volume streaming applications.
翻译:数据库研究可以在许多方面帮助机器学习绩效。 一种方法是设计更好的数据结构。 本文将使用递增计算和顺序及概率过滤相结合, 以便“ 忘记” 基于树的学习算法来应对概念漂移数据( 即从输入到分类的功能随时间变化而变化的数据 ) 。 本文描述的遗忘算法在保持对流数据高质量预测的同时实现了高时间性能。 具体地说, 算法比最先进的递增算法快24倍, 其精度损失最多为2%, 或至少两倍, 且不丧失准确性。 这使得这类结构适合高容量流动应用程序 。