Federated learning (FL) is an effective mechanism for data privacy in recommender systems by running machine learning model training on-device. While prior FL optimizations tackled the data and system heterogeneity challenges faced by FL, they assume the two are independent of each other. This fundamental assumption is not reflective of real-world, large-scale recommender systems -- data and system heterogeneity are tightly intertwined. This paper takes a data-driven approach to show the inter-dependence of data and system heterogeneity in real-world data and quantifies its impact on the overall model quality and fairness. We design a framework, RF^2, to model the inter-dependence and evaluate its impact on state-of-the-art model optimization techniques for federated recommendation tasks. We demonstrate that the impact on fairness can be severe under realistic heterogeneity scenarios, by up to 15.8--41x compared to a simple setup assumed in most (if not all) prior work. It means when realistic system-induced data heterogeneity is not properly modeled, the fairness impact of an optimization can be downplayed by up to 41x. The result shows that modeling realistic system-induced data heterogeneity is essential to achieving fair federated recommendation learning. We plan to open-source RF^2 to enable future design and evaluation of FL innovations.
翻译:联邦学习(FL)是建议系统中数据隐私的有效机制,通过运行机器学习模式培训,在设计设备上进行机器学习模式培训,使建议系统中的数据隐私性成为有效机制。虽然以前FL优化解决了FL面临的数据和系统差异性挑战,但他们认为两者是相互独立的。这一基本假设并不反映现实世界,大规模建议系统 -- -- 数据和系统差异性紧密交织。本文采用数据驱动方法,以显示数据相互依存和系统系统差异性真实世界数据,并量化其对总体模型质量和公平的影响。我们设计了一个框架,即RF2/2,以模拟相互依存关系,并评价其对于FL的混合建议任务对最先进的模型优化技术的影响。我们证明,与大多数(如果不是全部的话)先前工作中假设的简单设置相比,对公平性的影响可能是严重的。这意味着,当现实系统驱动数据差异性对总体模型没有进行适当建模时,那么将公平性优化的公平性影响表现到现实的FRF-RF-R-II系统的基本设计结果。我们通过引入了公平性设计F-RF-F-F-F-F-F-F-F-F-S-S-I-F-I-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-I-I-F-F-F-F-F-F-F-F-F-I-I-I-I-I-I-I-F-F-F-F-F-F-F-F-F-I-I-I-I-I-I-I-F-F-F-F-F-F-F-F-F-I-I-I-I-I-I-I-I-I-I-I-I-I-I-I-F-I-I-I-I-I-I-I-I-I-I-I-I-I-I-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-F-I-I-I-I