It has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. A machine learning model may pick up undesirable correlations, for example, between a patient's racial identity and clinical outcome. Such correlations are often present in (historical) data used for model development. There has been an increase in studies reporting biases in image-based disease detection models. Besides the scarcity of data from underserved populations, very little is known about how these biases are encoded and how one may reduce or even remove disparate performance. There are concerns that an algorithm may recognize patient characteristics such as biological sex or racial identity, and then directly or indirectly use this information when making predictions. But it remains unclear how we can establish whether such information is actually used. This article aims to shed some light on these issues by exploring different methodology for assessing the inner working of disease detection models. We explore multitask learning and model inspection to assess the relationship between protected characteristics and prediction of disease. We believe our analysis framework could provide valuable insights in future studies in medical imaging AI. Our findings also call for further research to better understand the underlying causes of performance disparities.
翻译:人们正确地强调,将AI用于临床决策可能会扩大健康差异。机器学习模式可能会发现病人的种族特征和临床结果之间不可取的关联,例如病人的种族认同和临床结果之间的关联。这种关联往往存在于用于模型开发的(历史)数据中。报告基于图像的疾病检测模型偏见的研究有所增加。除了服务不足的人口缺乏数据之外,对于这些偏见是如何编码的以及如何减少或甚至消除差异性表现,人们知之甚少。有人担心算法可能承认病人的特征,例如生物性别或种族特征,然后在作出预测时直接或间接地使用这种信息。但是,我们仍不清楚我们如何能够确定这种信息是否实际使用。这一文章的目的是通过探索不同的方法来评估疾病检测模型的内部作用。我们探索多任务学习和模型检查,以评估受保护特征与疾病预测之间的关系。我们认为,我们的分析框架可以在医学成像 AI 的未来研究中提供宝贵的洞察力。我们的调查结果还要求进行进一步的研究,以便更好地了解业绩差异的根本原因。