不同差异环境中的联邦自我监督学习:HAR基准方法的局限性 (Federated Self-Supervised Learning in Heterogeneous Settings: Limits of a Baseline Approach on HAR)

from arxiv, S. Ek, R. Rombourg, F. Portet and P. Lalanda, "Federated Self-Supervised Learning in Heterogeneous Settings: Limits of a Baseline Approach on HAR," 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops), 2022, pp. 557-562

Federated Learning is a new machine learning paradigm dealing with distributed model learning on independent devices. One of the many advantages of federated learning is that training data stay on devices (such as smartphones), and only learned models are shared with a centralized server. In the case of supervised learning, labeling is entrusted to the clients. However, acquiring such labels can be prohibitively expensive and error-prone for many tasks, such as human activity recognition. Hence, a wealth of data remains unlabelled and unexploited. Most existing federated learning approaches that focus mainly on supervised learning have mostly ignored this mass of unlabelled data. Furthermore, it is unclear whether standard federated Learning approaches are suited to self-supervised learning. The few studies that have dealt with the problem have limited themselves to the favorable situation of homogeneous datasets. This work lays the groundwork for a reference evaluation of federated Learning with Semi-Supervised Learning in a realistic setting. We show that standard lightweight autoencoder and standard Federated Averaging fail to learn a robust representation for Human Activity Recognition with several realistic heterogeneous datasets. These findings advocate for a more intensive research effort in Federated Self Supervised Learning to exploit the mass of heterogeneous unlabelled data present on mobile devices.

翻译：联邦学习是一种新的机器学习模式,它涉及在独立设备上分布的模型学习。联合会学习的许多好处之一是,培训数据留在设备上(例如智能手机),只有学习模式与中央服务器共享。在监督学习的情况下,标签委托给客户。然而,获得这种标签对于许多任务来说,例如人类活动识别等来说,成本太高,容易出错,代价太高,而且可能令人望而却步。因此,大量的数据仍然没有标签,也没有开发。大多数主要侧重于监督学习的现有综合学习方法大多忽视了这种大量未加标签的数据。此外,标准联合学习方法是否适合自我监督学习,也不清楚标准联合学习方法是否适合自我监督学习。处理该问题的少数研究仅限于单一数据集的有利状况。这项工作为在现实环境中对与半超模学习的混合学习进行参考评价奠定了基础。我们显示,标准的轻度自动校正和标准联合校准方法大多忽视了人类活动识别的强有力代表性,而有几种现实的混成数据数据集。这些研究成果都表明,目前用于更紧密的移动模型的自我学习努力。