Is it possible to design an universal API for federated learning using which an ad-hoc group of data-holders (agents) collaborate with each other and perform federated learning? Such an API would necessarily need to be model-agnostic i.e. make no assumption about the model architecture being used by the agents, and also cannot rely on having representative public data at hand. Knowledge distillation (KD) is the obvious tool of choice to design such protocols. However, surprisingly, we show that most natural KD-based federated learning protocols have poor performance. To investigate this, we propose a new theoretical framework, Federated Kernel ridge regression, which can capture both model heterogeneity as well as data heterogeneity. Our analysis shows that the degradation is largely due to a fundamental limitation of knowledge distillation under data heterogeneity. We further validate our framework by analyzing and designing new protocols based on KD. Their performance on real world experiments using neural networks, though still unsatisfactory, closely matches our theoretical predictions.
翻译:能否设计一个通用的全非学问信息基础设施,供联合学习使用,由一组数据持有者(代理人)相互合作,并进行联合学习?这样一个非科学信息基础设施必然必须是模型-不可知性,即不假定代理人使用的模型结构,也不依赖手头具有代表性的公共数据。知识蒸馏(KD)是设计这种协议的明显选择工具。然而,令人惊讶的是,我们发现大多数以自然KD为基础的联合学习协议的性能不佳。为了调查这一点,我们提出了一个新的理论框架,即Fed Kernel山脊回归,它既可以捕捉模型的异质性,也可以捕捉数据的异质性。我们的分析表明,这种退化在很大程度上是由于数据异质性下知识蒸馏的根本限制。我们进一步通过分析和设计基于KD的新协议来验证我们的框架。他们利用神经网络进行的真实世界实验的表现虽然仍然不令人满意,但与我们的理论预测非常吻合。