We consider a federated representation learning framework, where with the assistance of a central server, a group of $N$ distributed clients train collaboratively over their private data, for the representations (or embeddings) of a set of entities (e.g., users in a social network). Under this framework, for the key step of aggregating local embeddings trained at the clients in a private manner, we develop a secure embedding aggregation protocol named SecEA, which provides information-theoretical privacy guarantees for the set of entities and the corresponding embeddings at each client $simultaneously$, against a curious server and up to $T < N/2$ colluding clients. As the first step of SecEA, the federated learning system performs a private entity union, for each client to learn all the entities in the system without knowing which entities belong to which clients. In each aggregation round, the local embeddings are secretly shared among the clients using Lagrange interpolation, and then each client constructs coded queries to retrieve the aggregated embeddings for the intended entities. We perform comprehensive experiments on various representation learning tasks to evaluate the utility and efficiency of SecEA, and empirically demonstrate that compared with embedding aggregation protocols without (or with weaker) privacy guarantees, SecEA incurs negligible performance loss (within 5%); and the additional computation latency of SecEA diminishes for training deeper models on larger datasets.
翻译:在中央服务器的协助下,一个由美元分发的客户组成的小组通过私人数据对一组实体(例如社会网络中的用户)的表述(或嵌入)进行了合作培训。在这个框架内,我们考虑一个联邦化的代表学习框架,作为汇集在客户中受过私人培训的本地嵌入体的关键步骤,我们开发了一个名为SecEA的安全嵌入聚合协议,为各实体的组合和每个客户的相应更深层嵌入提供了信息-理论隐私保障,对一个好奇的服务器和最多达$T < N/2美元的串通客户进行了合作培训。作为SecEA的第一步,联邦化学习系统实施一个私营实体联盟,每个客户学习系统中所有实体,而不知道哪个实体属于哪个客户。在每轮汇总中,地方嵌入由客户秘密共享,然后每个客户建立代码查询,以检索预定实体的合并嵌入。我们进行了全面的测试,在Secrange Instital 5的模型中,我们用更低的模型进行更低的模型学习,并用更低的模型来评估成本。