多领域积极学习:比较研究 (Multi-Domain Active Learning: A Comparative Study)

Multi-domain learning (MDL) refers to learning a set of models simultaneously, with each one specialized to perform a task in a certain domain. Generally, high labeling effort is required in MDL, as data needs to be labeled by human experts for every domain. Active learning (AL), which reduces labeling effort by only using the most informative data, can be utilized to address the above issue. The resultant paradigm is termed multi-domain active learning (MDAL). However, currently little research has been done in MDAL, not to mention any off-the-shelf solution. To fill this gap, we present a comprehensive comparative study of 20 different MDAL algorithms, which are established by combining five representative MDL models under different information-sharing schemes and four well-used AL strategies under different categories. We evaluate the algorithms on five datasets, involving textual and visual classification tasks. We find that the models which capture both domain-dependent and domain-specific information are more likely to perform well in the whole AL loops. Besides, the simplest informative-based uncertainty strategy surprisingly performs good in most datasets. As our off-the-shelf recommendation, the combination of Multinomial Adversarial Networks (MAN) with the best vs second best (BvSB) uncertainty strategy shows its superiority in most cases, and this combination is also robust across datasets and domains.

翻译：多域学习( MDL) 指的是同时学习一套模型, 每种模型都专门用来执行某一领域的任务。一般来说, MDL需要高标签, 因为数据需要由每个领域的人类专家贴上标签。积极学习( AL) 仅使用最丰富的信息数据, 减少了标签工作, 可用于解决上述问题。由此产生的模式被称为多域积极学习( MDL ) 。然而, 目前对MDAL 的研究很少, 更不用说任何现成的解决办法。为了填补这一空白, 我们对20种不同的MDAL 算法进行了全面的比较研究, 这些算法是通过将五种具有代表性的MDL 模型组合在不同信息共享计划下和不同类别下四项使用良好的AL 战略建立的。我们评估了五套数据集的算法, 包括文字和视觉分类任务。我们发现, 收集依赖域和特定域信息的模型更有可能在整个AL 循环中运行良好的组合。此外, 最简单的基于信息的不确定性战略在大多数数据集中表现良好。由于我们最不可靠的网络和最高级域域中展示了ADRV 。