This paper considers the secure aggregation problem for federated learning under an information theoretic cryptographic formulation, where distributed training nodes (referred to as users) train models based on their own local data and a curious-but-honest server aggregates the trained models without retrieving other information about users' local data. Secure aggregation generally contains two phases, namely key sharing phase and model aggregation phase. Due to the common effect of user dropouts in federated learning, the model aggregation phase should contain two rounds, where in the first round the users transmit masked models and, in the second round, according to the identity of surviving users after the first round, these surviving users transmit some further messages to help the server decrypt the sum of users' trained models. The objective of the considered information theoretic formulation is to characterize the capacity region of the communication rates in the two rounds from the users to the server in the model aggregation phase, assuming that key sharing has already been performed offline in prior. In this context, Zhao and Sun completely characterized the capacity region under the assumption that the keys can be arbitrary random variables. More recently, an additional constraint, known as "uncoded groupwise keys," has been introduced. This constraint entails the presence of multiple independent keys within the system, with each key being shared by precisely S users. The capacity region for the information-theoretic secure aggregation problem with uncoded groupwise keys was established in our recent work subject to the condition S > K - U, where K is the number of total users and U is the designed minimum number of surviving users. In this paper we fully characterize of the the capacity region for this problem by proposing a new converse bound and an achievable scheme.
翻译:暂无翻译