Machine Learning (ML) systems are getting increasingly popular, and drive more and more applications and services in our daily life. This has led to growing concerns over user privacy, since human interaction data typically needs to be transmitted to the cloud in order to train and improve such systems. Federated learning (FL) has recently emerged as a method for training ML models on edge devices using sensitive user data and is seen as a way to mitigate concerns over data privacy. However, since ML models are most commonly trained with label supervision, we need a way to extract labels on edge to make FL viable. In this work, we propose a strategy for training FL models using positive and negative user feedback. We also design a novel framework to study different noise patterns in user feedback, and explore how well standard noise-robust objectives can help mitigate this noise when training models in a federated setting. We evaluate our proposed training setup through detailed experiments on two text classification datasets and analyze the effects of varying levels of user reliability and feedback noise on model performance. We show that our method improves substantially over a self-training baseline, achieving performance closer to models trained with full supervision.
翻译:机器学习(ML)系统越来越受欢迎,在日常生活中越来越多地推动越来越多的应用和服务。这导致对用户隐私的日益关切,因为人类互动数据通常需要传送到云层,以便培训和改进这些系统。联邦学习(FL)最近成为利用敏感的用户数据培训边缘设备ML模型的方法,并被视为减轻对数据隐私关切的一种方法。然而,由于ML模型最常受到标签监督的培训,我们需要一种方法来提取边缘标签,使FL具有可行性。我们在此工作中提出了一个战略,利用积极和消极用户反馈来培训FL模型。我们还设计了一个新框架,研究用户反馈中不同的噪音模式,并探讨在采用联合化环境下培训模型时,标准的噪音-紫外线目标如何有助于缓解这种噪音。我们通过对两个文本分类数据集进行详细试验,评估我们拟议的培训设置,分析不同程度的用户可靠性和反馈噪音对模型性能的影响。我们表明,我们的方法大大改进了自我培训基线,使模型更接近全面监督的模型。