Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work.
翻译:在线积极学习是机器学习的一个范例,目的是从数据流中选择最丰富的数据点,从数据流中标出标签。近年来,尽量减少与收集标签观测有关的费用的问题引起了许多关注,特别是在真实世界应用程序中,因为只有数据以未贴标签的形式提供。每个观察都可能耗时且费用高昂,因此难以获得大量标签数据。为了解决这一问题,在过去几十年中提出了许多积极的学习战略,目的是选择最丰富的观察点,用于标签,以改进机器学习模型的性能。这些方法可以大致分为两类:静态集合式和流式积极学习。基于集合的积极学习涉及从封闭的无标签数据库中挑选一组观察,这是许多调查和文献审查的重点。然而,数据流的日益普及导致以在线积极学习为重点的方法增多,包括不断选择和标出最丰富的观察点,以提高机器学习模型的性能。这项工作旨在概述最近提出的选择最丰富信息集和以流为基础的积极学习方法,目的是从数据流中挑选最丰富的观察领域,我们从在线研究中学习了各种机会。