Mass surveillance systems for voice over IP (VoIP) conversations pose a huge risk to privacy. These automated systems use learning models to analyze conversations, and upon detecting calls that involve specific topics, route them to a human agent. In this study, we present an adversarial learning-based framework for privacy protection for VoIP conversations. We present a novel algorithm that finds a universal adversarial perturbation (UAP), which, when added to the audio stream, prevents an eavesdropper from automatically detecting the conversation's topic. As shown in our experiments, the UAP is agnostic to the speaker or audio length, and its volume can be changed in real-time, as needed. In a real-world demonstration, we use a Teensy microcontroller that acts as an external microphone and adds the UAP to the audio in real-time. We examine different speakers, VoIP applications (Skype, Zoom), audio lengths, and speech-to-text models (Deep Speech, Kaldi). Our results in the real world suggest that our approach is a feasible solution for privacy protection.
翻译:IP (VoIP) 对话的大规模声音监控系统对隐私构成了巨大的风险。 这些自动化系统使用学习模型来分析对话,在发现涉及特定主题的电话时,将其引向人体代理人。 在这项研究中,我们为VoIP 对话提供了一个基于隐私保护的对抗性学习框架。 我们提出了一个新颖的算法,找到一种通用的对抗性扰动(UAP ), 一旦添加到音频流中, 就会防止窃听器自动发现对话主题。 正如我们的实验所显示的, UAP 对演讲者或音频长度是不可知的, 其音量可以按需要实时改变。 在现实世界的演示中, 我们使用一个作为外部麦克风的Teensy微控制器, 并将UAP添加到实时的音频中。 我们检查了不同的演讲者, VoIP 应用程序( Skype, Zomom) 、 音频长度和语音到文字模型(Deep Speaction, Kaldi) 。 我们在现实世界中的结果表明我们的方法是保护隐私的可行解决方案。