An often unquestioned assumption underlying most current federated learning algorithms is that all the participants use identical model architectures. In this work, we initiate a theoretical study of model agnostic communication protocols which would allow data holders (agents) using different models to collaborate with each other and perform federated learning. We focus on the setting where the two agents are attempting to perform kernel regression using different kernels (and hence have different models). Our study yields a surprising result -- the most natural algorithm of using alternating knowledge distillation (AKD) imposes overly strong regularization and may lead to severe under-fitting. Our theory also shows an interesting connection between AKD and the alternating projection algorithm for finding intersection of sets. Leveraging this connection, we propose a new algorithms which improve upon AKD. Our theoretical predictions also closely match real world experiments using neural networks. Thus, our work proposes a rich yet tractable framework for analyzing and developing new practical model agnostic federated learning algorithms.
翻译:在这项工作中,我们开始对模型不可知通信协议进行理论研究,使数据持有者(代理人)能够使用不同的模型相互协作,并进行联合学习。我们侧重于两个代理人试图使用不同的内核进行内核回归(并因此产生不同的模型)的场景。我们的研究产生了一个令人惊讶的结果 -- -- 使用交替知识蒸馏法的最自然算法(AKD)要求过强的正规化,并可能导致严重不完善。我们的理论还显示了AKD与寻找各组交叉点的交替预测算法之间的有趣联系。利用这种联系,我们提出了改进AKD的新算法。我们的理论预测也与使用内核网络进行的真实世界实验非常吻合。因此,我们的工作为分析和开发新的实用模型的、有源的、有源的、有源的、可移植的学习算法提出了一个丰富的框架。