The emergence of voice-assistant devices ushers in delightful user experiences not just on the smart home front, but also in diverse educational environments from classrooms to personalized-learning/tutoring. However, the use of voice as an interaction modality also could result in exposure of user's identity, and hinders the broader adoption of voice interfaces; this is especially important in environments where children are present and their voice privacy needs to be protected. To this end, building on state-of-the-art techniques proposed in the literature, we design and evaluate a practical and efficient framework for voice privacy at the source. The approach combines speaker identification (SID) and speech conversion methods to randomly disguise the identity of users right on the device that records the speech, while ensuring that the transformed utterances of users can still be successfully transcribed by Automatic Speech Recognition (ASR) solutions. We evaluate the ASR performance of the conversion in terms of word error rate and show the promise of this framework in preserving the content of the input speech.
翻译:语音辅助装置的出现带来了令人愉快的用户体验,不仅在智能家庭前沿,而且在从教室到个性化学习/引导等不同教育环境中也带来了令人愉快的用户体验,然而,使用语音作为互动模式还可能导致暴露用户身份,并阻碍更广泛地采用语音界面;这在儿童在场且其语音隐私需要保护的环境中尤其重要。为此,我们利用文献中提议的最先进的技术,设计和评价一个实用有效的信息来源语音隐私框架。这种方法结合了语音识别和语音转换方法,随机地在记录演讲的装置上隐藏用户身份,同时确保用户的变换语仍然可以通过自动语音识别(ASR)解决方案成功转写。我们从字差率的角度评价了转换的ASR表现,并展示了这一框架在保护投入演讲内容方面的承诺。