Deep learning models were frequently reported to learn from shortcuts like dataset biases. As deep learning is playing an increasingly important role in the modern healthcare system, it is of great need to combat shortcut learning in medical data as well as develop unbiased and trustworthy models. In this paper, we study the problem of developing debiased chest X-ray diagnosis models from the biased training data without knowing exactly the bias labels. We start with the observations that the imbalance of bias distribution is one of the key reasons causing shortcut learning, and the dataset biases are preferred by the model if they were easier to be learned than the intended features. Based on these observations, we proposed a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels via generalized cross entropy loss and then trains a debiased model using pseudo bias labels and bias-balanced softmax function. We constructed several chest X-ray datasets with various dataset bias situations and demonstrated with extensive experiments that our proposed method achieved consistent improvements over other state-of-the-art approaches.
翻译:深层学习模式经常被报告为从诸如数据集偏差等捷径中学习。深层学习模式在现代医疗体系中发挥着越来越重要的作用,因此极有必要打击在医疗数据中进行捷径学习以及开发公正和可信赖的模式。在本文中,我们研究从偏向培训数据中开发偏向的胸前X射线诊断模型的问题,而没有确切了解偏向标签。我们首先观察到,偏向分布的不平衡是导致捷径学习的主要原因之一,如果比预期特征更容易学习,则该模式偏向于数据集。根据这些观察,我们提出了一种新奇算法、假的偏向平衡学习方法,首先通过通用的交叉输卵器损失捕捉和预测每个样本的偏向性标签,然后用假偏向标签和偏向软体功能来培养一个偏向模型。我们用各种数据偏差情况构建了几个胸X射线数据集,并进行了广泛的实验,证明我们提出的方法与其他最先进的方法一致地改进了。