Foundation models are considered a breakthrough in all applications of AI, promising robust and reusable mechanisms for feature extraction, alleviating the need for large amounts of high quality annotated training data for task-specific prediction models. However, foundation models may potentially encode and even reinforce existing biases present in historic datasets. Given the limited ability to scrutinize foundation models, it remains unclear whether the opportunities outweigh the risks in safety critical applications such as clinical decision making. In our statistical bias analysis of a recently published, and publicly accessible chest X-ray foundation model, we found reasons for concern as the model seems to encode protected characteristics including biological sex and racial identity. When used for the downstream application of disease detection, we observed substantial degradation of performance of the foundation model compared to a standard model with specific disparities in protected subgroups. While research into foundation models for healthcare applications is in an early stage, we hope to raise awareness of the risks by highlighting the importance of conducting thorough bias and subgroup performance analyses.
翻译:基础模型被认为在AI的所有应用中都是一个突破,有希望地物提取的可靠和可再利用的机制,减轻了为具体任务预测模型提供大量高质量附加说明的培训数据的必要性,然而,基础模型有可能编码甚至强化历史数据集中存在的现有偏见。鉴于仔细审查基础模型的能力有限,仍然不清楚这些机会是否大于临床决策等安全关键应用的风险。在对最近出版的胸腔X射线基础模型进行统计偏差分析以及向公众开放的胸腔X射线基础模型中,我们感到担忧的原因是,该模型似乎将受保护的特征,包括生物性别和种族特征编码起来。当用于下游疾病检测时,我们观察到基础模型的性能与受保护分组中存在具体差异的标准模型相比大大退化。虽然对保健应用基础模型的研究尚处于早期阶段,但我们希望通过强调彻底的偏见和分组性绩效分析的重要性来提高对风险的认识。