Multi-domain learning (MDL) refers to learning a set of models simultaneously, with each one specialized to perform a task in a certain domain. Generally, high labeling effort is required in MDL, as data needs to be labeled by human experts for every domain. To address the above issue, Active learning (AL) can be utilized to reduce the labeling effort by only using the most informative data. The resultant paradigm is termed multi-domain active learning (MDAL). However, despite the practical significance of MDAL, there exists little research on it, not to mention any off-the-shelf solution. To fill this gap, we construct a simple pipeline of MDAL, and present a comprehensive comparative study of 30 different MDAL algorithms, which are established by combining 6 representative MDL models (equipped with various information-sharing schemes) and 5 well-used AL strategies. We evaluate the algorithms on 6 datasets, involving textual and visual classification tasks. In most cases, AL brings notable improvements to MDL, and surprisingly, the naive best vs second best (BvSB) uncertainty strategy could perform competitively to the state-of-the-art AL strategies. Besides, among the MDL models, MAN and SDL-joint achieve the top performance when applied to vector features and raw images, respectively. Furthermore, we qualitatively analyze the behaviors of these strategies and models, shedding light on their superior performance in the comparison. Overall, some guidelines are provided, which could help to choose MDL models and AL strategies for particular applications.
翻译:多方向学习(MDL)是指同时学习一套模型,每个模型都专门用来执行某一领域的任务。一般而言,MDL需要高标签努力,因为数据需要由人类专家为每个领域贴上标签。为了解决上述问题,可以使用积极学习(AL)来减少标签工作,仅使用最丰富的信息数据。由此产生的模式被称为多领域积极学习(MDAL)。然而,尽管MDAL具有实际意义,但对它的研究却很少,更不用说任何现成的解决办法。为填补这一空白,我们需要建立一个简单的MDAL应用管道,并展示30种不同的MDAL算法的全面比较研究,这些算法是由6个具有代表性的MDL模型(配备各种信息分享计划)和5个使用良好的AL战略建立的。我们评估了6个数据集的算法,包括文字和视觉分类任务。在多数情况下,AL给MDL带来显著的改进,令人惊讶的是,在最天情最佳的帮助(BvSB)中,我们建造了一个具有竞争力的升级性总体分析模式,在SDML上,我们提供了最高级和最高级的MAL业绩战略。