Noise robustness in keyword spotting remains a challenge as many models fail to overcome the heavy influence of noises, causing the deterioration of the quality of feature embeddings. We proposed a contrastive regularization method called Inter-Intra Contrastive Regularization (I2CR) to improve the feature representations by guiding the model to learn the fundamental speech information specific to the cluster. This involves maximizing the similarity across Intra and Inter samples of the same class. As a result, it pulls the instances closer to more generalized representations that form more prominent clusters and reduces the adverse impact of noises. We show that our method provides consistent improvements in accuracy over different backbone model architectures under different noise environments. We also demonstrate that our proposed framework has improved the accuracy of unseen out-of-domain noises and unseen variant noise SNRs. This indicates the significance of our work with the overall refinement in noise robustness.
翻译:由于许多模型未能克服噪音的严重影响,造成地物嵌入质量的恶化,因此在关键词识别中的噪音稳健性仍然是一个挑战。我们提议了一种对比性正规化方法,称为内部竞争正规化(I2CR),以通过指导模型来学习该组特有的基本语音信息来改进特征表现方式。这涉及到最大限度地提高同一类内部和内部抽样的相似性。因此,它将这些实例引向更普遍的表述方式,形成更突出的集群并减少噪音的不利影响。我们表明,我们的方法在不同噪音环境下不同主干模型结构的准确性不断提高。我们还表明,我们拟议的框架提高了外地隐隐噪音和不可见变异噪音SNR的准确性。这表明了我们工作的重要性,因为噪音稳健性总体上得到了完善。