Modern analytical systems must be ready to process streaming data and correctly respond to data distribution changes. The phenomenon of changes in data distributions is called concept drift, and it may harm the quality of the used models. Additionally, the possibility of concept drift appearance causes that the used algorithms must be ready for the continuous adaptation of the model to the changing data distributions. This work focuses on non-stationary data stream classification, where a classifier ensemble is used. To keep the ensemble model up to date, the new base classifiers are trained on the incoming data blocks and added to the ensemble while, at the same time, outdated models are removed from the ensemble. One of the problems with this type of model is the fast reaction to changes in data distributions. We propose a new Chunk Adaptive Restoration framework that can be adapted to any block-based data stream classification algorithm. The proposed algorithm adjusts the data chunk size in the case of concept drift detection to minimize the impact of the change on the predictive performance of the used model. The conducted experimental research, backed up with the statistical tests, has proven that Chunk Adaptive Restoration significantly reduces the model's restoration time.
翻译:现代分析系统必须准备就绪,以便处理流数据,并正确回应数据分布的变化。数据分配的变化现象被称为概念漂移,可能会损害使用模型的质量。此外,概念漂移外观的可能性导致使用过的算法必须准备就绪,使模型能够不断适应不断变化的数据分布。这项工作侧重于非静止数据流分类,即使用一个分类器组合值。为了不断更新组合模型,新的基级分类器在输入的数据区块上接受培训,并添加到共同体中,同时将过时的模型从共同体中去除。这种模型的一个问题是对数据分布变化的快速反应。我们建议一个新的整形适应框架,以适应任何基于块的数据流分类算法。拟议的算法调整了概念漂移检测中的数据块大小,以尽量减少变化对使用模型预测性能的影响。所进行的实验研究,与统计测试相支持后,证明了Chunk 适应性恢复模型大大缩短了时间的恢复。