Learning from non-stationary data streams is a research direction that gains increasing interest as more data in form of streams becomes available, for example from social media, smartphones, or industrial process monitoring. Most approaches assume that the ground truth of the samples becomes available (possibly with some delay) and perform supervised online learning in the test-then-train scheme. While this assumption might be valid in some scenarios, it does not apply to all settings. In this work, we focus on scarcely labeled data streams and explore the potential of self-labeling in gradually drifting data streams. We formalize this setup and propose a novel online $k$-nn classifier that combines self-labeling and demand-based active learning.
翻译:从非静止数据流中学习是一个研究方向,随着以流形式提供的数据增多,例如社交媒体、智能手机或工业过程监测,人们越来越感兴趣。大多数方法假定,样本的地面真相可以提供(可能有些延误),并在测试-培训计划中进行监督在线学习。虽然这一假设在某些情景中可能有效,但并不适用于所有环境。在这项工作中,我们侧重于标签很少的数据流,并探索在逐渐流动的数据流中自我标签的可能性。我们正式确定这一设置,并提议一个将自我标签和需求积极学习结合起来的新颖的在线美元-nn分类器。