Active learning (AL) attempts to maximize the performance gain of the model by marking the fewest samples. Deep learning (DL) is greedy for data and requires a large amount of data supply to optimize massive parameters, so that the model learns how to extract high-quality features. In recent years, due to the rapid development of internet technology, we are in an era of information torrents and we have massive amounts of data. In this way, DL has aroused strong interest of researchers and has been rapidly developed. Compared with DL, researchers have relatively low interest in AL. This is mainly because before the rise of DL, traditional machine learning requires relatively few labeled samples. Therefore, early AL is difficult to reflect the value it deserves. Although DL has made breakthroughs in various fields, most of this success is due to the publicity of the large number of existing annotation datasets. However, the acquisition of a large number of high-quality annotated datasets consumes a lot of manpower, which is not allowed in some fields that require high expertise, especially in the fields of speech recognition, information extraction, medical images, etc. Therefore, AL has gradually received due attention. A natural idea is whether AL can be used to reduce the cost of sample annotations, while retaining the powerful learning capabilities of DL. Therefore, deep active learning (DAL) has emerged. Although the related research has been quite abundant, it lacks a comprehensive survey of DAL. This article is to fill this gap, we provide a formal classification method for the existing work, and a comprehensive and systematic overview. In addition, we also analyzed and summarized the development of DAL from the perspective of application. Finally, we discussed the confusion and problems in DAL, and gave some possible development directions for DAL.
翻译:积极学习( AL) 尝试通过标记最少的样本来最大限度地提高模型的性能。 深度学习( DL) 对数据贪婪, 需要大量的数据供应来优化大规模参数, 以便模型学习如何提取高质量的特征。 近年来, 由于互联网技术的迅速发展, 我们正处于信息种子时代, 我们拥有大量的数据。 这样, DL 引起了研究人员的强烈兴趣, 并且已经迅速发展。 与 DL 相比, 研究人员对 AL 的兴趣相对较低。 这主要是因为在DL 升起之前, 传统机器学习需要较少的标签样本。 因此, 早期 AL 很难反映它应得的价值。 尽管 DL 在各个领域取得了突破, 但大部分成功是由于大量现有注解数据集的公示。 然而, 大量高品质的附加说明数据集的获取消耗了大量人力, 与 D 相比, 研究人员对AL 填补了相对较少的概览。 特别是在语音识别、 信息提取、 医学图像等领域, 早期的AL 也很难反映它应有的价值 AL 。 最后, AL AL 正在逐渐使用大量地学习D AL 的 D 。, 进行 的 和 研究