防止通过声频反对立探索机器学习探测实时语音情感的隐私 (Privacy against Real-Time Speech Emotion Detection via Acoustic Adversarial Evasion of Machine Learning)

Emotional Surveillance is an emerging area with wide-reaching privacy concerns. These concerns are exacerbated by ubiquitous IoT devices with multiple sensors that can support these surveillance use cases. The work presented here considers one such use case: the use of a speech emotion recognition (SER) classifier tied to a smart speaker. This work demonstrates the ability to evade black-box SER classifiers tied to a smart speaker without compromising the utility of the smart speaker. This privacy concern is considered through the lens of adversarial evasion of machine learning. Our solution, Defeating Acoustic Recognition of Emotion via Genetic Programming (DARE-GP), uses genetic programming to generate non-invasive additive audio perturbations (AAPs). By constraining the evolution of these AAPs, transcription accuracy can be protected while simultaneously degrading SER classifier performance. The additive nature of these AAPs, along with an approach that generates these AAPs for a fixed set of users in an utterance and user location-independent manner, supports real-time, real-world evasion of SER classifiers. DARE-GP's use of spectral features, which underlay the emotional content of speech, allows the transferability of AAPs to previously unseen black-box SER classifiers. Further, DARE-GP outperforms state-of-the-art SER evasion techniques and is robust against defenses employed by a knowledgeable adversary. The evaluations in this work culminate with acoustic evaluations against two off-the-shelf commercial smart speakers, where a single AAP could evade a black box classifier over 70% of the time. The final evaluation deployed AAP playback on a small-form-factor system (raspberry pi) integrated with a wake-word system to evaluate the efficacy of a real-world, real-time deployment where DARE-GP is automatically invoked with the smart speaker's wake word.

翻译：情感监控是一个新兴领域,其隐私问题影响深远。这些关注因无处不在的互联网视频设备以及能够支持这些监控使用案例的多个传感器而加剧。本文介绍的工作认为一个这样的使用案例: 使用与智能演讲者捆绑的语音情绪识别(SER)分类器。这项工作表明能够避开与智能演讲者捆绑的黑盒子SER分类器,同时又不损害聪明演讲者的功用。这种隐私关切是通过对冲规避机器学习的镜头来考虑的。我们的解决方案《通过遗传程序(DARE-GP)破坏对情感的快速认知,利用基因程序生成非侵入性添加性添加性添加性添加性能支持情感。 DARE-GP(A)使用智能程序生成不侵入性添加性添加性添加性添加性识别性识别性识别性识别性识别性识别性识别性识别性识别性识别性识别性识别性识别性识别性识别性能。在SER GAL(DA-G-Ralder-alder-A-A-Seral-Seral-Serviews)系统上, 进一步使用一个智能智能智能智能智能智能智能智能智能智能智能智能智能智能智能智能定位系统,让ADADADA-A-A-ADR-ADRDRDRD-S-S-S-S-S-SDDDD-S-S-S-S-S-S-S-S-de-de-de-de-de-dealental-de-de-de-de-dealentalentalviald-de-deald-lad-deald-lad-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-de-deal-de-de-de-de-de-de-de-de-de-de-deal-laction-laction-laction-de-de-de-de-de-de-laction-deal-de-de-de-de-de-de-de-de-de-de-de-deal-l-de-de-l-de-de-de-de-de-de-de-de-de-de