Recent developments in pre-trained speech representation utilizing self-supervised learning (SSL) have yielded exceptional results on a variety of downstream tasks. One such technique, known as masked predictive coding (MPC), has been employed by some of the most high-performing models. In this study, we investigate the impact of MPC loss on the type of information learnt at various layers in the HuBERT model, using nine probing tasks. Our findings indicate that the amount of content information learned at various layers of the HuBERT model has a positive correlation to the MPC loss. Additionally, it is also observed that any speaker-related information learned at intermediate layers of the model, is an indirect consequence of the learning process, and therefore cannot be controlled using the MPC loss. These findings may serve as inspiration for further research in the speech community, specifically in the development of new pre-training tasks or the exploration of new pre-training criterion's that directly preserves both speaker and content information at various layers of a learnt model.
翻译:利用自我监督的学习(SSL)在培训前演讲说明方面的最新发展在一系列下游任务方面产生了不同结果,其中一种技术,即隐蔽的预测编码(MPC),被一些最优秀的模型所采用。在本研究中,我们调查了MPC损失对在HuBERT模式不同层次上利用九项考察任务获得的信息类型的影响。我们的调查结果表明,在HuBERT模式不同层次所学到的内容信息数量与MPC损失有着积极的关系。此外,还发现,在模型中间层次所学到的任何与演讲者有关的信息都是学习过程的间接后果,因此无法利用MPC损失加以控制。这些发现可作为对演讲界进行进一步研究的灵感,特别是在开发新的培训前任务或探索新的培训前标准方面,在学习模式的不同层次上直接保存演讲者和内容信息。</s>