Horizontal federated learning (HFL) enables distributed clients to train a shared model and keep their data privacy. In training high-quality HFL models, the data heterogeneity among clients is one of the major concerns. However, due to the security issue and the complexity of deep learning models, it is challenging to investigate data heterogeneity across different clients. To address this issue, based on a requirement analysis we developed a visual analytics tool, HetVis, for participating clients to explore data heterogeneity. We identify data heterogeneity through comparing prediction behaviors of the global federated model and the stand-alone model trained with local data. Then, a context-aware clustering of the inconsistent records is done, to provide a summary of data heterogeneity. Combining with the proposed comparison techniques, we develop a novel set of visualizations to identify heterogeneity issues in HFL. We designed three case studies to introduce how HetVis can assist client analysts in understanding different types of heterogeneity issues. Expert reviews and a comparative study demonstrate the effectiveness of HetVis.
翻译:横向联盟学习(HFL)使分布式客户能够培训一个共享模型并保持其数据隐私。在培训高质量的HFL模型时,客户之间的数据异质性是主要关切之一。然而,由于安全问题和深层学习模型的复杂性,调查不同客户的数据异质性具有挑战性。为了解决这一问题,我们根据需求分析开发了一个视觉分析工具HetVis,供参与客户探索数据异质性。我们通过比较全球联合模型的预测行为和接受当地数据培训的独立模型,确定数据异质性。然后,对不一致的记录进行背景认知组合,以提供数据异质性摘要。结合拟议的比较技术,我们开发了一套新的视觉化图象,以确定HFL的异质性问题。我们设计了三个案例研究,以介绍HetVis如何帮助客户分析员了解不同种类的异质性问题。专家审查和比较研究展示了HetVis的实效。