Privacy protection is an important and concerning topic in Federated Learning, especially for Natural Language Processing. In client devices, a large number of texts containing personal information are produced by users every day. As the direct application of information from users is likely to invade personal privacy, many methods have been proposed in Federated Learning to block the center model from the raw information in client devices. In this paper, we try to do this more linguistically via distorting the text while preserving the semantics. In practice, we leverage a recently proposed metric, Neighboring Distribution Divergence, to evaluate the semantic preservation during the distortion. Based on the metric, we propose two frameworks for semantics-preserved distortion, a generative one and a substitutive one. Due to the lack of privacy-related tasks in the current Natural Language Processing field, we conduct experiments on named entity recognition and constituency parsing. Results from our experiments show the plausibility and efficiency of our distortion as a method for personal privacy protection.
翻译:隐私保护在联邦学习联盟中是一项重要议题,特别是在自然语言处理方面。在客户设备中,用户每天制作大量含有个人信息的文本。由于用户信息的直接应用可能侵犯个人隐私,联邦学习联盟提出了许多方法,将中心模型从客户设备中的原始信息中阻断。在本文中,我们试图通过扭曲文本而以语言方式做,同时保留语义。在实践中,我们利用最近提出的一个标准,即“邻里分配差异”来评估扭曲期间的语义保护。根据该标准,我们提出了两种语义保护扭曲框架,一种是基因扭曲框架,另一种是替代框架。由于当前“自然语言处理”领域缺乏与隐私有关的任务,我们进行了关于名称实体识别和选区划分的实验。我们实验的结果显示,我们扭曲作为个人隐私保护方法的可信赖性和效率。