Privacy and security are major concerns when communicating speech signals to cloud services such as automatic speech recognition (ASR) and speech emotion recognition (SER). Existing solutions for speech anonymization mainly focus on voice conversion or voice modification to convert a raw utterance into another one with similar content but different, or no, identity-related information. However, an alternative approach to share speech data under the form of privacy-preserving representation has been largely under-explored. In this paper, we propose a speech anonymization framework that achieves privacy via noise perturbation to a selected subset of the high-utility representations extracted using a pre-trained speech encoder. The subset is chosen with a Transformer-based privacy-risk saliency estimator. We validate our framework on four tasks, namely, Automatic Speaker Verification (ASV), ASR, SER and Intent Classification (IC) for privacy and utility assessment. Experimental results show that our approach is able to achieve a competitive, or even better, utility compared to the speech anonymization baselines from the VoicePrivacy2022 Challenges, providing the same level of privacy. Moreover, the easily-controlled amount of perturbation allows our framework to have a flexible range of privacy-utility trade-offs without re-training any component.
翻译:在将语音信号传递给诸如自动语音识别和语音情绪识别等云端服务时,隐私和安全是主要关切问题。现有语音匿名化解决方案主要侧重于语音转换或语音修改,以便将原始话转换成内容相似但与身份有关的信息不同或没有相同的信息。然而,以隐私保护代表形式分享语音数据的替代方法基本上没有得到充分探讨。在本文件中,我们提议一个语音匿名化框架,通过噪音扰动实现隐私,通过使用事先经过培训的语音编码器提取的高可用性代表的选定部分实现隐私。该子组采用基于变换器的隐私风险显著估计器进行选择。我们在四个任务上验证了我们的框架,即:自动代言人核查(ASV)、ASR、SER和Intent分类(IC),用于隐私和效用评估。实验结果表明,我们的方法能够实现与从语音普里瓦茨22挑战提取的语音识别基线相比具有竞争性,甚至更好的效用。提供同样程度的隐私风险显要度。此外,我们易于控制的隐私贸易框架允许任何程度的重新控制。