数据组合:利用部分标签利用多个数据集 (Data-Assemble: Leveraging Multiple Datasets with Partial Labels)

The success of deep learning relies heavily on large and diverse datasets with extensive labels, but we often only have access to several small datasets associated with partial labels. In this paper, we start a new initiative, "Data-Assemble", that aims to unleash the full potential of partially labeled data from an assembly of public datasets. Specifically, we introduce a new dynamic adapter to encode different visual tasks, which addresses the challenges of incomparable, heterogeneous, or even conflicting labeling protocols. We also employ pseudo-labeling and consistency constraints to harness data with missing labels and to mitigate the domain gap across datasets. From rigorous evaluations on three natural imaging and six medical imaging tasks, we discover that learning from "negative examples" facilitates both classification and segmentation of classes of interest. This sheds new light on the computer-aided diagnosis of rare diseases and emerging pandemics, wherein "positive examples" are hard to collect, yet "negative examples" are relatively easier to assemble. Apart from exceeding prior arts in the ChestXray benchmark, our model is particularly strong in identifying diseases of minority classes, yielding over 3-point improvement on average. Remarkably, when using existing partial labels, our model performance is on-par with that using full labels, eliminating the need for an additional 40% of annotation costs. Code will be made available at https://github.com/MrGiovanni/DataAssemble.

翻译：深层次学习的成功在很大程度上依赖于具有广泛标签的庞大和多样化的数据集,但我们往往只能获得与部分标签相关的若干小型数据集。在本文中,我们开始了一项新举措,即“数据-组合”倡议,目的是释放公共数据集组装部分标签数据的全部潜力。具体地说,我们引入一个新的动态调整器,以编码不同的视觉任务,解决无法比较、混杂、甚至相互矛盾的标签协议的挑战。我们还使用假标签和一致性限制,以利用缺少标签的数据和缩小数据集之间的域间差距。从对三种自然成像和六种医学成像任务的严格评价中,我们发现从“负面实例”中学习有助于对感兴趣的类别进行分类和分解。这为计算机辅助的罕见疾病和新出现的流行病诊断提供了新的光芒,其中“积极例子”很难收集,而“负面例子”则比较容易收集。除了在先有的化学Xray基准中超前的艺术之外,我们的模型在确定少数群体类疾病和六种域域间的域间差距方面特别强大。我们发现,从“负面实例”中学习的学习有助于分类和分门分门分门分门分门分门分门分门。这为分门分门分门分门分门分门分门。这为分门分门分路。这为稀各行提供了新的诊断方法,在使用普通的改进后,需要在现有的40代代代代代代代代代代代代代代代代代代代号中,在目前代代代代代代代代代代代代代代代代代代代代代代代代代号后,需要后,在现行代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代代