Which samples should be labelled in a large data set is one of the most important problems for trainingof deep learning. So far, a variety of active sample selection strategies related to deep learning havebeen proposed in many literatures. We defined them as Active Deep Learning (ADL) only if theirpredictor is deep model, where the basic learner is called as predictor and the labeling schemes iscalled selector. In this survey, three fundamental factors in selector designation were summarized. Wecategory ADL into model-driven ADL and data-driven ADL, by whether its selector is model-drivenor data-driven. The different characteristics of the two major type of ADL were addressed in indetail respectively. Furthermore, different sub-classes of data-driven and model-driven ADL are alsosummarized and discussed emphatically. The advantages and disadvantages between data-driven ADLand model-driven ADL are thoroughly analyzed. We pointed out that, with the development of deeplearning, the selector in ADL also is experiencing the stage from model-driven to data-driven. Finally,we make discussion on ADL about its uncertainty, explanatory, foundations of cognitive science etc.and survey on the trend of ADL from model-driven to data-driven.
翻译:大型数据集中应该标出哪些样本,是深层次学习培训的最重要问题之一。到目前为止,许多文献中都提出了与深层次学习相关的各种主动抽样选择战略,许多文献都提出了这些战略。我们将其定义为“主动深层学习”战略,但前提是其源代码是深层模型,基础学习者被称为预测者,标签办法称为选择者。在本次调查中,对选择者指定的三个基本因素进行了总结。我们将ADL分为模型驱动的ADL和数据驱动的ADL,其选择者是否为模型驱动数据驱动的。两种主要类型的ADL的不同特点分别以不确切的方式处理。此外,数据驱动和模型驱动的ADL的不同子类也作了概括和深入讨论。对数据驱动的ADL和模型驱动的ADL的利弊进行了透彻分析。我们指出,随着深层次学习的发展,ADL的选者也正在经历从模型驱动到数据驱动的阶段。最后,我们从ADL的不确定性、解释性基础到ADL的数据驱动性研究等模式进行了讨论。