In the area of Internet of Things (IoT) voice assistants have become an important interface to operate smart speakers, smartphones, and even automobiles. To save power and protect user privacy, voice assistants send commands to the cloud only if a small set of pre-registered wake-up words are detected. However, voice assistants are shown to be vulnerable to the FakeWake phenomena, whereby they are inadvertently triggered by innocent-sounding fuzzy words. In this paper, we present a systematic investigation of the FakeWake phenomena from three aspects. To start with, we design the first fuzzy word generator to automatically and efficiently produce fuzzy words instead of searching through a swarm of audio materials. We manage to generate 965 fuzzy words covering 8 most popular English and Chinese smart speakers. To explain the causes underlying the FakeWake phenomena, we construct an interpretable tree-based decision model, which reveals phonetic features that contribute to false acceptance of fuzzy words by wake-up word detectors. Finally, we propose remedies to mitigate the effect of FakeWake. The results show that the strengthened models are not only resilient to fuzzy words but also achieve better overall performance on original training datasets.
翻译:在事物互联网(IoT)语音助理领域,声音助理已成为操作智能扬声器、智能手机、甚至汽车的重要界面。为了节省电力和保护用户隐私,声音助理只有在发现少量预先登记的警醒词时才向云中发送指令。然而,声音助理被证明容易受假Wake现象的影响,而这种现象是无意识的模糊字眼无意引发的。在本文中,我们从三个方面对假冒Wake现象进行了系统调查。首先,我们设计了第一个模糊字生成器,以自动和高效地生成模糊字,而不是通过声频材料群搜索。我们设法生成了965个模糊字,涵盖8个最受欢迎的英语和中国智能词组。为了解释FakeWake现象的根源,我们构建了一个可解释的树本决策模型,它揭示了通过唤醒词探测器导致错误接受模糊字眼的音调特征。最后,我们提出了减轻法基Wake效应的补救措施。结果显示,强化的模型不仅具有适应性,而且能更好完成整个数据。