在非静止环境中进行以流为基础的与核查时间不长的非静止环境中进行核查性积极学习 (Stream-based Active Learning with Verification Latency in Non-stationary Environments)

Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.

翻译：在机器学习领域,数据流分类是一个重要问题。由于数据的非固定性质,基本分布随时间而变化(概念漂移),模型需要不断适应新的数据统计。基于流基积极学习(AL)方法通过互动询问一位人类专家,在有限预算范围内为最新样本提供新的数据标签来解决这个问题。现有的AL战略假定标签可以立即提供,而在现实世界情景中,专家需要时间提供查询标签(核查时),而所要求的标签可能不再相关。在本篇文章中,我们调查有限、可变和未知的核查延迟的影响,因为存在AL方法上的概念漂移。我们建议PROPAGate(PR),即一个隐含独立用途估算器,它也预测所要求的,但尚不为人所知的标签。此外,我们建议采用一种基于漂移的动态动态预算战略,在检测到总体漂移之后,使用标签预算的可变的分布方式。我们研究了有限、可变的实验性评估,在合成和真实的预算模式下,我们不断分析预算结构中,我们进行了不同的分析。