Automatic Speaker Diarization (ASD) is an enabling technology with numerous applications, which deals with recordings of multiple speakers, raising special concerns in terms of privacy. In fact, in remote settings, where recordings are shared with a server, clients relinquish not only the privacy of their conversation, but also of all the information that can be inferred from their voices. However, to the best of our knowledge, the development of privacy-preserving ASD systems has been overlooked thus far. In this work, we tackle this problem using a combination of two cryptographic techniques, Secure Multiparty Computation (SMC) and Secure Modular Hashing, and apply them to the two main steps of a cascaded ASD system: speaker embedding extraction and agglomerative hierarchical clustering. Our system is able to achieve a reasonable trade-off between performance and efficiency, presenting real-time factors of 1.1 and 1.6, for two different SMC security settings.
翻译:自动发音器 Diarization(ASD)是一种具有多种应用的赋能技术,涉及多个发言者的录音,引起对隐私的特殊关注。事实上,在远程环境中,录音与服务器共享,客户不仅放弃谈话的隐私,而且放弃从他们的声音中可以推断的所有信息。然而,据我们所知,迄今为止一直忽视开发保护隐私的ASD系统。在这项工作中,我们结合了两种加密技术(安全多党计算(SMC)和安全模块散列(Sdular Hashing))来解决这一问题,并将其应用到一个级联的ASD系统的两个主要步骤:发言人嵌入提取和聚合性等级组合。我们的系统能够在两个不同的SMC安全环境中实现业绩和效率之间的合理权衡,显示1.1和1.6的实时系数。