In the last few years, Deep Learning models have become increasingly popular. However, their deployment is still precluded in those contexts where the amount of supervised data is limited and manual labelling expensive. Active learning strategies aim at solving this problem by requiring supervision only on few unlabelled samples, which improve the most model performances after adding them to the training set. Most strategies are based on uncertain sample selection, and even often restricted to samples lying close to the decision boundary. Here we propose a very different approach, taking into consideration domain knowledge. Indeed, in the case of multi-label classification, the relationships among classes offer a way to spot incoherent predictions, i.e., predictions where the model may most likely need supervision. We have developed a framework where first-order-logic knowledge is converted into constraints and their violation is checked as a natural guide for sample selection. We empirically demonstrate that knowledge-driven strategy outperforms standard strategies, particularly on those datasets where domain knowledge is complete. Furthermore, we show how the proposed approach enables discovering data distributions lying far from training data. Finally, the proposed knowledge-driven strategy can be also easily used in object-detection problems where standard uncertainty-based techniques are difficult to apply.
翻译:在过去几年里,深学习模式越来越受欢迎。然而,在监督数据数量有限、人工标签昂贵的情况下,仍然无法运用这些模式。积极学习战略的目的是通过只要求少数未贴标签的样本进行监督来解决这一问题,这在将这些样本添加到成套培训中后改进了最模型的性能。大多数战略都基于不确定的样本选择,甚至往往局限于靠近决定界限的样本。我们在这里提出了一个非常不同的方法,考虑到域域知识。事实上,在多标签分类的情况下,各班级之间的关系提供了一种发现不一致预测的方法,即预测模型最可能需要监督的模型。我们制定了一个框架,将一等-逻辑知识转化为制约,并将其违反情况作为选择样本的自然指南加以检查。我们从经验上证明,知识驱动的战略超越了标准战略,特别是在那些域知识完备的数据集上。此外,我们展示了拟议方法如何能够发现远离培训数据的数据分布。最后,拟议的知识驱动战略也可以很容易地应用于基于目标偏误问题的标准问题。