边缘的单一语言隐私保护 (Paralinguistic Privacy Protection at the Edge)

Voice user interfaces and digital assistants are rapidly entering our lives and becoming singular touch points spanning our devices. These always-on services capture and transmit our audio data to powerful cloud services for further processing and subsequent actions. Our voices and raw audio signals collected through these devices contain a host of sensitive paralinguistic information that is transmitted to service providers regardless of deliberate or false triggers. As our emotional patterns and sensitive attributes like our identity, gender, mental well-being, are easily inferred using deep acoustic models, we encounter a new generation of privacy risks by using these services. One approach to mitigate the risk of paralinguistic-based privacy breaches is to exploit a combination of cloud-based processing with privacy-preserving, on-device paralinguistic information learning and filtering before transmitting voice data. In this paper we introduce EDGY, a configurable, lightweight, disentangled representation learning framework that transforms and filters high-dimensional voice data to identify and contain sensitive attributes at the edge prior to offloading to the cloud. We evaluate EDGY's on-device performance and explore optimization techniques, including model quantization and knowledge distillation, to enable private, accurate and efficient representation learning on resource-constrained devices. Our results show that EDGY runs in tens of milliseconds with 0.2% relative improvement in ABX score or minimal performance penalties in learning linguistic representations from raw voice signals, using a CPU and a single-core ARM processor without specialized hardware.

翻译：语音用户界面和数字助手正在迅速进入我们的生活,成为我们设备上的独特触摸点。这些总是服务捕获我们的音频数据,并将我们的音频数据传送给强大的云服务,以便进一步处理和随后采取行动。通过这些设备收集的我们的声音和原始音频信号包含一系列敏感的语言语言信息,这些信息被传递给服务提供者,而不论有意或虚假触发因素如何。由于我们的情感模式和敏感属性,例如我们的身份、性别、心理健康等,很容易使用深声学模型推断出来,我们通过使用这些服务而遇到新一代的隐私风险。减少基于语言的隐私侵犯风险的一个办法是利用基于云的处理与保护隐私、在传输语音数据前进行基于语言的理论信息学习和过滤的混合。在本文件中,我们引入了一个可配置、轻巧、不相交、不相交、不相交的代言语学习框架,以识别和包含在向云中倾斜音前边缘的敏感属性。我们评价EDGY的脱义性工作表现和探索不精度技术,包括在传输前的Credicial-ral Streal化和Sqal Produal Stabial Proport Procial Stabial Produ Produ 上,使我们企业的自我演化的自我演化和排序学习结果的自我演化。