With the increasing popularity of voice-based applications, acoustic eavesdropping has become a serious threat to users' privacy. While on smartphones the access to microphones needs an explicit user permission, acoustic eavesdropping attacks can rely on motion sensors (such as accelerometer and gyroscope), which access is unrestricted. However, previous instances of such attacks can only recognize a limited set of pre-trained words or phrases. In this paper, we present AccEar, an accelerometerbased acoustic eavesdropping attack that can reconstruct any audio played on the smartphone's loudspeaker with unconstrained vocabulary. We show that an attacker can employ a conditional Generative Adversarial Network (cGAN) to reconstruct highfidelity audio from low-frequency accelerometer signals. The presented cGAN model learns to recreate high-frequency components of the user's voice from low-frequency accelerometer signals through spectrogram enhancement. We assess the feasibility and effectiveness of AccEar attack in a thorough set of experiments using audio from 16 public personalities. As shown by the results in both objective and subjective evaluations, AccEar successfully reconstructs user speeches from accelerometer signals in different scenarios including varying sampling rate, audio volume, device model, etc.
翻译:随着语音应用越来越受欢迎,窃听声学已成为对用户隐私的严重威胁。在智能手机上,窃听声学需要明确的用户许可,但窃听声学攻击可以依赖不受限制的动作传感器(如加速仪和陀螺仪),然而,以前这类攻击的事例只能识别有限的一套预先训练的文字或短语。在本文中,我们介绍AccEar,一种基于加速计的声学窃听攻击,它可以用不受限制的词汇重建智能手机扩音器播放的任何音频。我们表明攻击者可以使用有条件的感应反声网络(cAN),从低频加速仪信号中重建高音频音频。介绍的CGAN模型通过光谱仪增强来重建用户声音的高频部分。我们评估了AccEAR攻击在智能扩音器扩音器扩音器上以不受限制词汇约束的音频调器播放的任何音响。我们显示,攻击者可以使用有条件的GENA AA AA A A A 和 不同比例 和 的磁度图像显示,从16 的磁度分析模型显示的磁度分析结果。