Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model. It can be used to effectively initialize the encoder of an Automatic Speech Recognition (ASR) model. We present a novel modification of CPC called Guided Contrastive Predictive Coding (GCPC). Our proposed method maximizes the mutual information between representations from a prior-knowledge model and the output of the model being pre-trained, allowing prior knowledge injection during pre-training. We validate our method on 3 ASR tasks: German, French and English. Our method outperforms CPC pre-training on all three datasets, reducing the Word Error Rate (WER) by 4.44%, 6.55% and 15.43% relative on the German, French and English (Librispeech) tasks respectively, compared to training from scratch, while CPC pre-training only brings 2.96%, 1.01% and 14.39% relative WER reduction respectively.
翻译:对比性预测编码(CPC)是一种代表式学习方法,它使中间潜在代表与特定模型输出之间的相互信息最大化。它可用于有效启动自动语音识别模式的编码器。我们提出了名为“引导式相反预测编码(GCPC)”的CPC新修改。我们提出的方法使来自前知识模式的表示与经过预先培训的模型输出之间的相互信息最大化,在培训前允许事先知识注入。我们在3个ASR任务(德语、法语和英语)上验证了我们的方法。我们的方法比所有3个数据集的CP预培训效果都好,将德文、法文和英文(利布里斯佩奇)任务中的文字错误率分别降低了4.44%、6.55%和15.43%,与从零开始的培训相比,而CPC的预培训只分别降低了2.96%、1.01%和14.39 %。