Automatic Speaker Diarization (ASD) is an enabling technology with numerous applications, which deals with recordings of multiple speakers, raising special concerns in terms of privacy. In fact, in remote settings, where recordings are shared with a server, clients relinquish not only the privacy of their conversation, but also of all the information that can be inferred from their voices. However, to the best of our knowledge, the development of privacy-preserving ASD systems has been overlooked thus far. In this work, we tackle this problem using a combination of two cryptographic techniques, Secure Multiparty Computation (SMC) and Secure Modular Hashing, and apply them to the two main steps of a cascaded ASD system: speaker embedding extraction and agglomerative hierarchical clustering. Our system is able to achieve a reasonable trade-off between performance and efficiency, presenting real-time factors of 1.1 and 1.6, for two different SMC security settings.
翻译:自动说话者分离(ASD)是一种有着众多应用的支持技术,涉及到多个说话者的录音,因此在隐私方面引起了特殊的关注。事实上,在远程设置中,当录音与服务器共享时,客户不仅放弃了他们对话的隐私,以及可以从其声音中推断出的所有信息的隐私。然而,据我们所知,隐私保护ASD系统的开发迄今为止已被忽视。在这项工作中,我们使用两种加密技术,安全多方计算(SMC)和安全模块哈希,结合应用于级联ASD系统的两个主要步骤:说话者嵌入提取和凝聚层次聚类。我们的系统能够在性能和效率之间取得合理的权衡,在两个不同的SMC安全设置下呈现实时系数为1.1和1.6。