Speech model adaptation is crucial to handle the discrepancy between server-side proxy training data and actual data received on local devices of users. With the use of federated learning (FL), we introduce an efficient approach on continuously adapting neural network language models (NNLMs) on private devices with applications on automatic speech recognition (ASR). To address the potential speech transcription errors in the on-device training corpus, we perform empirical studies on comparing various strategies of leveraging token-level confidence scores to improve the NNLM quality in the FL settings. Experiments show that compared with no model adaptation, the proposed method achieves relative 2.6% and 10.8% word error rate (WER) reductions on two speech evaluation datasets, respectively. We also provide analysis in evaluating privacy guarantees of our presented procedure.
翻译:语音模型的适应对于处理服务器侧代用培训数据与从当地用户设备收到的实际数据之间的差异至关重要。通过使用联合学习(FL),我们引入了一种高效的方法,通过应用自动语音识别(ASR)来不断调整私人设备上的神经网络语言模型(NNLMs ) 。为了解决设备上培训文具中潜在的语音转录错误,我们进行了实证研究,比较了利用象征性信任分数提高FL环境中NNLM质量的各种战略。实验表明,与没有模型适应相比,拟议方法分别实现了两个语音评价数据集的相对2.6%和10.8%字差率的降低。我们还分析了我们提出的程序的隐私保障评估。