in healthcare. However, the existing AI model may be biased in its decision marking. The bias induced by data itself, such as collecting data in subgroups only, can be mitigated by including more diversified data. Distributed and collaborative learning is an approach to involve training models in massive, heterogeneous, and distributed data sources, also known as nodes. In this work, we target on examining the fairness issue in Swarm Learning (SL), a recent edge-computing based decentralized machine learning approach, which is designed for heterogeneous illnesses detection in precision medicine. SL has achieved high performance in clinical applications, but no attempt has been made to evaluate if SL can improve fairness. To address the problem, we present an empirical study by comparing the fairness among single (node) training, SL, centralized training. Specifically, we evaluate on large public available skin lesion dataset, which contains samples from various subgroups. The experiments demonstrate that SL does not exacerbate the fairness problem compared to centralized training and improves both performance and fairness compared to single training. However, there still exists biases in SL model and the implementation of SL is more complex than the alternative two strategies.
翻译:然而,现有的AI模式在决定标识方面可能存在偏差。数据本身引起的偏差,例如只收集分组数据,可以通过纳入更多样化的数据来减轻。分散和协作学习是一种将培训模式纳入大规模、多样性和分布式数据源(也称为节点)的方法。在这项工作中,我们的目标是审查Swarm Learning(SL)中的公平问题,这是一个最近为精密医学中不同疾病检测而设计的边际计算分散式机器学习方法。SL在临床应用方面取得了很高的成绩,但没有试图评价SL能否提高公平性。为了解决这个问题,我们提出了一个经验研究,比较单项(诺德)培训、SL、集中培训之间的公平性。具体地说,我们评估大型公共皮肤损伤数据集,其中包括不同分组的样本。实验表明SL没有比集中培训加剧公平问题,并且比单一培训提高性能和公平性。然而,SL模型中仍然存在着偏向性,而SL执行SL比其他两种战略更为复杂。