Deep retrieval models are widely used for learning entity representations and recommendations. Federated learning provides a privacy-preserving way to train these models without requiring centralization of user data. However, federated deep retrieval models usually perform much worse than their centralized counterparts due to non-IID (independent and identically distributed) training data on clients, an intrinsic property of federated learning that limits negatives available for training. We demonstrate that this issue is distinct from the commonly studied client drift problem. This work proposes batch-insensitive losses as a way to alleviate the non-IID negatives issue for federated movie recommendations. We explore a variety of techniques and identify that batch-insensitive losses can effectively improve the performance of federated deep retrieval models, increasing the relative recall of the federated model by up to 93.15% and reducing the relative gap in recall between it and a centralized model from 27.22% - 43.14% to 0.53% - 2.42%. We also open-source our code framework to accelerate further research and applications of federated deep retrieval models.
翻译:深度检索模型被广泛用于学习实体的表述和建议。 联邦学习为培训这些模型提供了一种不要求用户数据集中的隐私保护方式。 但是,联邦深层检索模型通常比中央对等模型的功能差得多,原因是关于客户的培训数据(独立和相同分布)没有国际开发公司(独立和相同分布),这是联合会学习的内在特性,限制了培训的负面内容。我们证明这一问题不同于通常研究过的客户漂移问题。这项工作提出分批敏感损失,以缓解联邦电影建议中非国际开发的负面问题。我们探索了多种技术,并查明分批敏感损失能够有效地改善联邦深层检索模型的性能,将联合模型的相对恢复率提高到93.15%,并将该模型与中央模型之间的相对差距从27.22% - 43.14%到0.53% - 2.42%缩小。我们还开发了我们的代码框架,以加快对联邦深层检索模型的进一步研究和应用。