In this study, we aim to explore efficient tuning methods for speech self-supervised learning. Recent studies show that self-supervised learning (SSL) can learn powerful representations for different speech tasks. However, fine-tuning pre-trained models for each downstream task is parameter-inefficient since SSL models are notoriously large with millions of parameters. Adapters are lightweight modules commonly used in NLP to solve this problem. In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained. Given the lack of studies generally exploring the effectiveness of adapters for self-supervised speech tasks, we intend to fill this gap by adding various adapter modules in pre-trained speech SSL models. We show that the performance parity can be achieved with over 90% parameter reduction, and discussed the pros and cons of efficient tuning techniques. This is the first comprehensive investigation of various adapter types across speech tasks.
翻译:在本研究中,我们的目标是探索语言自我监管学习的有效调校方法。最近的研究表明,自我监督学习(SSL)可以学习各种语言任务的强大表现。然而,对每个下游任务的预先培训模型进行微调是无效的参数,因为众所周知,SSL模型与数百万参数相比是巨大的。适应器是全国语言规划中通常用于解决这一问题的轻量模块。在下游任务中,SSL模型的参数被冻结,只有适应器得到培训。鉴于普遍缺乏研究探索自监督语言任务适应器的有效性,我们打算填补这一空白,在预先培训的 SLS 模型中添加各种适应器模块。我们表明,90%以上的参数可以实现性能均等,并讨论了高效调控技术的利弊。这是对各种语言任务中各种适应器类型的首次进行全面调查。