Federated learning (FL) has received high interest from researchers and practitioners to train machine learning (ML) models for healthcare. Ensuring the trustworthiness of these models is essential. Especially bias, defined as a disparity in the model's predictive performance across different subgroups, may cause unfairness against specific subgroups, which is an undesired phenomenon for trustworthy ML models. In this research, we address the question to which extent bias occurs in medical FL and how to prevent excessive bias through reward systems. We first evaluate how to measure the contributions of institutions toward predictive performance and bias in cross-silo medical FL with a Shapley value approximation method. In a second step, we design different reward systems incentivizing contributions toward high predictive performance or low bias. We then propose a combined reward system that incentivizes contributions toward both. We evaluate our work using multiple medical chest X-ray datasets focusing on patient subgroups defined by patient sex and age. Our results show that we can successfully measure contributions toward bias, and an integrated reward system successfully incentivizes contributions toward a well-performing model with low bias. While the partitioning of scans only slightly influences the overall bias, institutions with data predominantly from one subgroup introduce a favorable bias for this subgroup. Our results indicate that reward systems, which focus on predictive performance only, can transfer model bias against patients to an institutional level. Our work helps researchers and practitioners design reward systems for FL with well-aligned incentives for trustworthy ML.
翻译:联邦学习(FL) 已经引起了研究人员和从业者对培训机器学习(ML)医疗模式的高度兴趣。 确保这些模式的可信度至关重要。 特别是偏见,被定义为该模式在不同分组之间预测性业绩的差异,可能会对特定分组造成不公平,这是值得信赖的 ML 模式的一种不理想的现象。 在这项研究中,我们讨论了医疗FL 中出现偏向的程度以及如何通过奖励制度防止过度偏向的问题。我们首先评价如何衡量机构对预测性业绩的贡献和对跨Silorio医疗FL 的偏差的贡献,并采用“悲观价值近似”方法。在第二步,我们设计不同的奖励制度鼓励对高预测性业绩或低偏差的贡献。然后,我们提出一个联合奖励制度,鼓励对这两个分组的贡献。我们用多部的X射线数据集来评估我们的工作,重点是按病人的性别和年龄界定的分组。 我们的模型显示,我们可以成功地衡量对偏向偏向偏向偏向性的贡献,而综合奖励制度则成功地将贡献为业绩良好的病人模式,从低偏向偏向偏向一个低的分类。 我们的机构业绩的分组只能以微地对我们的系统进行分。