Voice assistive technologies have given rise to far-reaching privacy and security concerns. In this paper we investigate whether modular automatic speech recognition (ASR) can improve privacy in voice assistive systems by combining independently trained separation, recognition, and discretization modules to design configurable privacy-preserving ASR systems. We evaluate privacy concerns and the effects of applying various state-of-the-art techniques at each stage of the system, and report results using task-specific metrics (i.e. WER, ABX, and accuracy). We show that overlapping speech inputs to ASR systems present further privacy concerns, and how these may be mitigated using speech separation and optimization techniques. Our discretization module is shown to minimize paralinguistics privacy leakage from ASR acoustic models to levels commensurate with random guessing. We show that voice privacy can be configurable, and argue this presents new opportunities for privacy-preserving applications incorporating ASR.
翻译:在本文件中,我们调查模块自动语音识别(ASR)是否可以通过将经过独立训练的分离、识别和分解模块结合起来,设计可配置的隐私保护ASR系统,从而改善语音辅助系统中的隐私。我们评估隐私关切和在系统每个阶段应用各种最新技术的影响,并利用具体任务衡量标准(即WER、ABX和准确性)报告结果。我们表明,模块自动语音识别(ASR)系统重复的语音输入会进一步引起隐私关切,如何通过语音分离和优化技术来减轻这些关切。我们的分解模块显示,将ASR声学模型的隐蔽性隐私渗漏降至与随机猜测相适应的水平。我们显示,语音隐私是可以混为一体的,并论证这为包含ASR的隐私保护应用程序提供了新的机会。