Self-supervised model pre-training has recently garnered significant interest, but relatively few efforts have explored using additional resources in fine-tuning these models. We demonstrate how universal phoneset acoustic models can leverage cross-lingual supervision to improve transfer of pretrained self-supervised representations to new languages. We also show how target-language text can be used to enable and improve fine-tuning with the lattice-free maximum mutual information (LF-MMI) objective. In three low-resource languages these techniques greatly improved few-shot learning performance.
翻译:最近,自我监督的训练前示范模式引起了极大的兴趣,但探索利用额外资源微调这些模式的努力相对较少。我们展示了通用电话声学模型如何利用跨语言监督来改进将事先经过训练的自我监督的演示转变为新语言。我们还展示了如何利用目标语言文本来促成和改进与无线最大相互信息(LF-MMI)目标的微调。在三种低资源语言中,这些技术大大提高了微小的学习成绩。