非IID数据集的同质化：基于内部分布知识蒸馏的分布式学习 (Homogenizing Non-IID datasets via In-Distribution Knowledge Distillation for Decentralized Learning)

Decentralized learning enables serverless training of deep neural networks (DNNs) in a distributed manner on multiple nodes. This allows for the use of large datasets, as well as the ability to train with a wide variety of data sources. However, one of the key challenges with decentralized learning is heterogeneity in the data distribution across the nodes. In this paper, we propose In-Distribution Knowledge Distillation (IDKD) to address the challenge of heterogeneous data distribution. The goal of IDKD is to homogenize the data distribution across the nodes. While such data homogenization can be achieved by exchanging data among the nodes sacrificing privacy, IDKD achieves the same objective using a common public dataset across nodes without breaking the privacy constraint. This public dataset is different from the training dataset and is used to distill the knowledge from each node and communicate it to its neighbors through the generated labels. With traditional knowledge distillation, the generalization of the distilled model is reduced because all the public dataset samples are used irrespective of their similarity to the local dataset. Thus, we introduce an Out-of-Distribution (OoD) detector at each node to label a subset of the public dataset that maps close to the local training data distribution. Finally, only labels corresponding to these subsets are exchanged among the nodes and with appropriate label averaging each node is finetuned on these data subsets along with its local data. Our experiments on multiple image classification datasets and graph topologies show that the proposed IDKD scheme is more effective than traditional knowledge distillation and achieves state-of-the-art generalization performance on heterogeneously distributed data with minimal communication overhead.

翻译：分布式学习使得在多个节点上以分布式方式进行深度神经网络（DNN）的训练成为可能，从而可以使用大型数据集，并且能够使用多种数据源进行训练。然而，分布式学习的主要挑战之一是节点间数据分布的异质性。本文提出了基于内部分布知识蒸馏的In-Distribution Knowledge Distillation (IDKD)方法，以解决数据分布的异构性挑战。IDKD的目标是同质化节点间的数据分布。在不牺牲隐私性的前提下，可以通过在节点间使用公共数据集来实现此数据同质化。本文的公共数据集与训练数据集不同，用于从每个节点中提炼知识，并通过生成的标签向其邻居传递知识。通过传统的知识蒸馏，蒸馏模型的泛化性能会降低，因为公共数据集中的所有样本都被使用，而不考虑与本地数据集的相似性。因此，我们在每个节点上引入了一个Out-of-Distribution (OoD)检测器，以标记与本地训练数据分布相似的公共数据集子集。最后，在节点之间仅交换与这些子集相对应的标签，并进行适当的标签平均，以在这些数据子集以及本地数据的基础上对每个节点进行微调。我们对多个图像分类数据集和图形拓扑进行的实验表明，所提出的IDKD方案比传统的知识蒸馏更有效，并在数据分布异构的场景下实现了最先进的泛化性能，具有最小的通信开销。