With the development of smart devices, such as the Amazon Echo and Apple's HomePod, speech data have become a new dimension of big data. However, privacy and security concerns may hinder the collection and sharing of real-world speech data, which contain the speaker's identifiable information, i.e., voiceprint, which is considered a type of biometric identifier. Current studies on voiceprint privacy protection do not provide either a meaningful privacy-utility trade-off or a formal and rigorous definition of privacy. In this study, we design a novel and rigorous privacy metric for voiceprint privacy, which is referred to as voice-indistinguishability, by extending differential privacy. We also propose mechanisms and frameworks for privacy-preserving speech data release satisfying voice-indistinguishability. Experiments on public datasets verify the effectiveness and efficiency of the proposed methods.
翻译:随着亚马逊回声和苹果主页等智能设备的开发,语音数据已成为海量数据的新层面,然而,隐私和安全关切可能妨碍收集和分享真实世界的语音数据,这些数据包含发言者可识别的信息,即语音指纹,被视为一种生物识别特征。目前关于语音指纹保护的研究既不能提供有意义的私隐功交换,也不能对隐私作出正式和严格的定义。在本研究中,我们为语音指纹隐私设计了一个新颖和严格的隐私度量度标准,即语音可分性,扩大不同的隐私范围。我们还为隐私保护语音语音数据发布提供机制和框架,满足语音可分辨性。对公共数据集的实验可以验证拟议方法的有效性和效率。