ER-AE:为作者匿名而有区别的私人制版 (ER-AE: Differentially Private Text Generation for Authorship Anonymization)

Most of privacy protection studies for textual data focus on removing explicit sensitive identifiers. However, personal writing style, as a strong indicator of the authorship, is often neglected. Recent studies, such as SynTF, have shown promising results on privacy-preserving text mining. However, their anonymization algorithm can only output numeric term vectors which are difficult for the recipients to interpret. We propose a novel text generation model with a two-set exponential mechanism for authorship anonymization. By augmenting the semantic information through a REINFORCE training reward function, the model can generate differentially private text that has a close semantic and similar grammatical structure to the original text while removing personal traits of the writing style. It does not assume any conditioned labels or paralleled text data for training. We evaluate the performance of the proposed model on the real-life peer reviews dataset and the Yelp review dataset. The result suggests that our model outperforms the state-of-the-art on semantic preservation, authorship obfuscation, and stylometric transformation.

翻译：对文本数据的隐私保护研究大多侧重于删除明确的敏感识别资料。然而,个人写作风格,作为作者身份的有力指标,往往被忽略。最近的研究,如SynTF,显示在隐私保护文本挖掘方面有希望的结果。然而,它们的匿名算法只能输出难以接受者解释的数值矢量。我们提出了一个具有写作匿名的两套指数机制的新版文本生成模型。通过REINFORCE培训奖励功能来增加语义信息,该模型能够产生与原始文本有密切语义和类似语法结构的有差别的私人文本,同时消除文字风格的个人特征。它不假定任何有条件的标签或平行文本数据用于培训。我们评估了拟议版本的实时同行审查数据集和Yelp审查数据集的性能。结果表明,我们的模型超出了关于语义保护、作者混淆和特征转换的状态。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/