Clinical settings are often characterized by abundant unlabelled data and limited labelled data. This is typically driven by the high burden placed on oracles (e.g., physicians) to provide annotations. One way to mitigate this burden is via active learning (AL) which involves the (a) acquisition and (b) annotation of informative unlabelled instances. Whereas previous work addresses either one of these elements independently, we propose an AL framework that addresses both. For acquisition, we propose Bayesian Active Learning by Consistency (BALC), a sub-framework which perturbs both instances and network parameters and quantifies changes in the network output probability distribution. For annotation, we propose SoQal, a sub-framework that dynamically determines whether, for each acquired unlabelled instance, to request a label from an oracle or to pseudo-label it instead. We show that BALC can outperform start-of-the-art acquisition functions such as BALD, and SoQal outperforms baseline methods even in the presence of a noisy oracle.
翻译:临床环境往往以大量未贴标签的数据和有限的贴标签数据为特征,这通常是由神器(例如医生)提供注释的沉重负担驱动的。减轻这一负担的一种方法是积极学习(AL),这涉及(a) 获取和(b) 说明信息性未贴标签的情况。虽然先前的工作独立处理其中任何一个要素,但我们提议一个处理这两个要素的AL框架。为了获取,我们提议由一致参与的巴伊西亚积极学习(BALC),这是一个子框架,它会破坏实例和网络参数,并量化网络输出概率分布的变化。关于批注,我们提议SoQal,这是一个子框架,它动态地决定,对于每一个获得的无标签的实例,究竟是要求从一个神器上贴标签,还是用假标签代作标签。我们表明BALC可以超越像BALD这样的艺术购置功能的启动功能,而SoQal则超越基准方法,即使出现噪音或手柄时,它也不符合基准方法。