With the growing availability of smart devices and cloud services, personal speech assistance systems are increasingly used on a daily basis. Most devices redirect the voice recordings to a central server, which uses them for upgrading the recognizer model. This leads to major privacy concerns, since private data could be misused by the server or third parties. Federated learning is a decentralized optimization strategy that has been proposed to address such concerns. Utilizing this approach, private data is used for on-device training. Afterwards, updated model parameters are sent to the server to improve the global model, which is redistributed to the clients. In this work, we implement federated learning for speech recognition in a hybrid and an end-to-end model. We discuss the outcomes of these systems, which both show great similarities and only small improvements, pointing to a need for a deeper understanding of federated learning for speech recognition.
翻译:随着智能设备和云服务日益普及,个人语音协助系统每天都越来越多地被使用。大多数设备将语音录音转换到中央服务器,用于升级识别器模式。这引起了重大的隐私问题,因为私人数据可能被服务器或第三方滥用。联邦学习是一项分散化的优化战略,旨在解决这类关切。利用这一方法,私人数据被用于设备培训。随后,更新后的模型参数被发送到服务器,以改善全球模式,将其重新分配给客户。在这项工作中,我们用混合式和端对端模式进行语音识别联合学习。我们讨论了这些系统的结果,这些结果显示出很大的相似之处,只有很小的改进,表明需要更深入地了解为语音识别而联合学习的情况。