Recent advances in wearable devices and Internet-of-Things (IoT) have led to massive growth in sensor data generated in edge devices. Labeling such massive data for classification tasks has proven to be challenging. In addition, data generated by different users bear various personal attributes and edge heterogeneity, rendering it impractical to develop a global model that adapts well to all users. Concerns over data privacy and communication costs also prohibit centralized data accumulation and training. We propose SemiPFL that supports edge users having no label or limited labeled datasets and a sizable amount of unlabeled data that is insufficient to train a well-performing model. In this work, edge users collaborate to train a Hyper-network in the server, generating personalized autoencoders for each user. After receiving updates from edge users, the server produces a set of base models for each user, which the users locally aggregate them using their own labeled dataset. We comprehensively evaluate our proposed framework on various public datasets from a wide range of application scenarios, from wearable health to IoT, and demonstrate that SemiPFL outperforms state-of-art federated learning frameworks under the same assumptions regarding user performance, network footprint, and computational consumption. We also show that the solution performs well for users without label or having limited labeled datasets and increasing performance for increased labeled data and number of users, signifying the effectiveness of SemiPFL for handling data heterogeneity and limited annotation. We also demonstrate the stability of SemiPFL for handling user hardware resource heterogeneity in three real-time scenarios.
翻译:在可磨损装置和互联网电话(IoT)方面最近的进展导致边缘设备产生的传感器数据大幅增长。 将如此庞大的数据贴上分类任务标签证明具有挑战性。 此外, 不同用户生成的数据具有各种个人属性和边缘差异性, 使得开发适合所有用户的全球模型不切实际。 对数据隐私和通信成本的担忧也禁止集中数据积累和培训。 我们提议SepPFL 支持没有标签或有标签的数据集的边缘用户, 以及数量庞大的未贴标签的数据, 不足以培训一个良好的模型。 在这项工作中, 边缘用户合作在服务器上培训一个超网络, 产生针对每个用户的个人化自动编码。 在从边缘用户收到最新更新后, 服务器为每个用户制作了一套基础模型, 用户用他们自己贴标签的数据集来汇总这些模型。 我们全面评价了我们从易损健康到IoT等多种应用情景的各种公共数据集的拟议框架, 并显示SepFIL 超越了服务器的精确性能度, 也为不断增长的用户的精确数据计算结果, 测试框架之下, 我们为不断更新的用户在不断更新的标签标签和不断更新的数据计算中, 测试中, 进行同样的计算。