Keyword spotting (KWS) aims to discriminate a specific wake-up word from other signals precisely and efficiently for different users. Recent works utilize various deep networks to train KWS models with all users' speech data centralized without considering data privacy. Federated KWS (FedKWS) could serve as a solution without directly sharing users' data. However, the small amount of data, different user habits, and various accents could lead to fatal problems, e.g., overfitting or weight divergence. Hence, we propose several strategies to encourage the model not to overfit user-specific information in FedKWS. Specifically, we first propose an adversarial learning strategy, which updates the downloaded global model against an overfitted local model and explicitly encourages the global model to capture user-invariant information. Furthermore, we propose an adaptive local training strategy, letting clients with more training data and more uniform class distributions undertake more local update steps. Equivalently, this strategy could weaken the negative impacts of those users whose data is less qualified. Our proposed FedKWS-UI could explicitly and implicitly learn user-invariant information in FedKWS. Abundant experimental results on federated Google Speech Commands verify the effectiveness of FedKWS-UI.
翻译:最近的工作利用各种深层次的网络来培训KWS模型,将所有用户的语音数据集中起来,而不考虑数据隐私; 联邦KWS(FedKWS)可以在不直接分享用户数据的情况下作为一种解决办法; 然而,少量的数据、不同的用户习惯和各种口音可能导致致命问题,例如过度适应或重量差异。 因此,我们提出若干战略,鼓励该模型不过分适应FDKWS中用户专用信息。 具体地说,我们首先提出一个对抗性学习战略,根据一个过分适合的本地模型更新下载的全球模型,并明确鼓励全球模型获取用户变量信息; 此外,我们提出一个适应性的地方培训战略,让拥有更多培训数据的客户和更加统一的班级分配,采取更多的本地更新步骤。相当重要的是,这一战略会削弱那些数据不合格用户的负面影响。 我们提议的FDKWS-UI可以明确和隐含地学习FDKWSS的用户耐性信息。