Speech representations learned from Self-supervised learning (SSL) models can benefit various speech processing tasks. However, utilizing SSL representations usually requires fine-tuning the pre-trained models or designing task-specific downstream models and loss functions, causing much memory usage and human labor. Recently, prompting in Natural Language Processing (NLP) has been found to be an efficient technique to leverage pre-trained language models (LMs). Specifically, prompt tuning optimizes a limited number of task-specific parameters with a fixed pre-trained model; as a result, only a small set of parameters is needed to be stored for each task. Prompt tuning improves computation and memory efficiency by leveraging the pre-trained LM's prediction ability. Nevertheless, such a paradigm is little studied in the speech community. We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM). Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models. We further study the technique in challenging sequence generation tasks. Prompt tuning also demonstrates its potential, while the limitation and possible research directions are discussed in this paper. The source code is available on https://github.com/ga642381/SpeechPrompt.
翻译:从自我监督学习(SSL)模式中学习的语音演讲方式能够有助于各种语言处理任务。但是,利用SSL代表通常需要微调预先培训的模式或设计特定任务特定下游模式和损失功能,从而产生大量的记忆用和人工劳动。最近,在自然语言处理(NLP)中,促进自然语言处理(NLP)被认为是利用预先培训语言模式(LMS)的一个有效方法。具体地说,迅速调整使数量有限的特定任务参数优化,采用固定的预先培训模式;因此,只需要为每项任务储存一小套参数即可。迅速调整通过利用预先培训LM的预测能力提高计算和记忆效率。然而,这种模式在演讲界很少研究。我们在本文件中报告首次探索根据Genement Spoken语言模式(GLM)对语音处理任务的迅速调整模式进行探索。实验结果显示,迅速调整技术在语言分类任务中实现竞争性表现,其培训参数比精细调整专业下游模式要少。我们进一步研究了生成序列的技术。Propriming 81 Sprimal refriction souring the res.