Model inversion (MI) attacks allow to reconstruct average per-class representations of a machine learning (ML) model's training data. It has been shown that in scenarios where each class corresponds to a different individual, such as face classifiers, this represents a severe privacy risk. In this work, we explore a new application for MI: the extraction of speakers' voices from a speaker recognition system. We present an approach to (1) reconstruct audio samples from a trained ML model and (2) extract intermediate voice feature representations which provide valuable insights into the speakers' biometrics. Therefore, we propose an extension of MI attacks which we call sliding model inversion. Our sliding MI extends standard MI by iteratively inverting overlapping chunks of the audio samples and thereby leveraging the sequential properties of audio data for enhanced inversion performance. We show that one can use the inverted audio data to generate spoofed audio samples to impersonate a speaker, and execute voice-protected commands for highly secured systems on their behalf. To the best of our knowledge, our work is the first one extending MI attacks to audio data, and our results highlight the security risks resulting from the extraction of the biometric data in that setup.
翻译:模型反转(MI)攻击可以重建机器学习(ML)模型培训数据的平均每类显示比例。 已经表明,在每类与不同的个人对应的情景中,例如脸分级器,这代表了严重的隐私风险。 在这项工作中,我们探索了MI的新应用程序:从语音识别系统中提取发言者的声音。我们提出了一个方法:(1) 从经过培训的ML模型中重建音频样本,(2) 提取中间语音特征显示,为发言者的生物鉴别提供宝贵的洞察力。因此,我们提议扩展称为滑动模型的MI攻击。我们的滑动MI通过迭回重复的音样块来扩展标准MI,从而利用音频数据的顺序特性来增强反向性能。我们显示,我们可以使用反向音频数据生成假音样本来冒出发言者,并代表他们执行高度安全系统的语音保护命令。 据我们所知,我们的工作是将MI攻击扩大到音频数据的首个扩展,我们的结果突出了在设置的生物物理数据提取过程中产生的安全风险。