In real-world contexts, sometimes data are available in form of Natural Data Streams, i.e. data characterized by a streaming nature, unbalanced distribution, data drift over a long time frame and strong correlation of samples in short time ranges. Moreover, a clear separation between the traditional training and deployment phases is usually lacking. This data organization and fruition represents an interesting and challenging scenario for both traditional Machine and Deep Learning algorithms and incremental learning agents, i.e. agents that have the ability to incrementally improve their knowledge through the past experience. In this paper, we investigate the classification performance of a variety of algorithms that belong to various research field, i.e. Continual, Streaming and Online Learning, that receives as training input Natural Data Streams. The experimental validation is carried out on three different datasets, expressly organized to replicate this challenging setting.
翻译:在现实世界中,有时以自然数据流的形式提供数据,即具有流性质、分布不均、数据在较长时间范围内漂移和样本在短时间范围内密切关联的数据;此外,通常缺乏传统的培训和部署阶段之间的明确区分;这一数据组织和成果代表传统机器和深学习算法以及渐进学习代理人,即能够通过过去的经验逐步提高其知识的代理人,既有趣又富有挑战性的设想。在本文件中,我们调查属于各种研究领域的各种算法的分类性能,这些算法是连续、流和在线学习,作为培训投入获得的自然数据流。实验性验证是在三个不同的数据集上进行的,这些数据集显然是为复制这一挑战性环境而组织起来的。