This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset. Using this method, we develop two attack models that aim to infer speaker identity from the updated personalized models without access to the actual users' speech data. Experiments on the TED-LIUM 3 corpus demonstrate that the proposed approaches are very effective and can provide equal error rate (EER) of 1-2%.
翻译:本文探讨了在自动语音识别中从个性化扬言者调整神经网络声学模型(AMs)中有效检索语音信息的方法。在联合学习ASR声学模型的背景下,这一问题尤其重要,因为在服务器上根据从多个客户收到的更新信息学习了一个全球模型。我们建议了一种基于所谓的指标数据集神经网络足迹分析神经网络AMs信息的方法。我们使用这种方法开发了两种攻击模型,目的是从更新的个性化模型中推断发言者身份,而没有实际用户语音数据。TED-LIUM 3系列实验表明,拟议方法非常有效,可以提供1-2 % 的相同误差率。