Besides its linguistic content, our speech is rich in biometric information that can be inferred by classifiers. Learning privacy-preserving representations for speech signals enables downstream tasks without sharing unnecessary, private information about an individual. In this paper, we show that protecting gender information in speech is more effective than modelling speaker-identity information only when generating a non-sensitive representation of speech. Our method relies on reconstructing speech by decoding linguistic content along with gender information using a variational autoencoder. Specifically, we exploit disentangled representation learning to encode information about different attributes into separate subspaces that can be factorised independently. We present a novel way to encode gender information and disentangle two sensitive biometric identifiers, namely gender and identity, in a privacy-protecting setting. Experiments on the LibriSpeech dataset show that gender recognition and speaker verification can be reduced to a random guess, protecting against classification-based attacks.
翻译:除了语言内容外,我们的演讲内容丰富,可以由分类者推断出。学习保护隐私的语音信号代表可以进行下游任务,而不必分享不必要的私人个人信息。在本文中,我们表明,只有在生成非敏感的语音代表时,保护语音中的性别信息才比模拟语音身份信息更为有效。我们的方法依靠通过使用变式自动编码器解码语言内容和性别信息来重建语音信息。具体地说,我们利用分解的代表性学习,将关于不同属性的信息编码为可以独立计算的不同子空间。我们提出了一种新颖的方法,在隐私保护环境中将性别信息编码并解开两个敏感的生物识别特征,即性别和身份。LibriSpeech数据集实验显示,性别识别和语音核实可以降低为随机猜测,防止基于分类的攻击。