Prompt tuning (PT) is a promising parameter-efficient method to utilize extremely large pre-trained language models (PLMs), which could achieve comparable performance to full-parameter fine-tuning by only tuning a few soft prompts. However, compared to fine-tuning, PT empirically requires much more training steps. To explore whether we can improve the efficiency of PT by reusing trained soft prompts and sharing learned knowledge, we empirically investigate the transferability of soft prompts across different tasks and models. In cross-task transfer, we find that trained soft prompts can well transfer to similar tasks and initialize PT for them to accelerate training and improve performance. Moreover, to explore what factors influence prompts' transferability across tasks, we investigate how to measure the prompt similarity and find that the overlapping rate of activated neurons highly correlates to the transferability. In cross-model transfer, we explore how to project the prompts of a PLM to another PLM and successfully train a kind of projector which can achieve non-trivial transfer performance on similar tasks. However, initializing PT with the projected prompts does not work well, which may be caused by optimization preferences and PLMs' high redundancy. Our findings show that improving PT with knowledge transfer is possible and promising, while prompts' cross-task transferability is generally better than the cross-model transferability.
翻译:快速调用(PT)是一种很有希望的参数效率方法,可以使用极为庞大的预先培训语言模型(PLM),这种模型只能通过微调微调几个微调来达到完全参数微调的类似性能。然而,与微调相比,PT经验需要更多的培训步骤。为了探索我们能否通过重复使用经过训练的软调用和分享所学知识来提高PT的效率,我们从经验上调查在不同任务和模式之间软调用软调用是否可进行。在跨任务转让中,我们发现经过训练的软调用能很好地转换到类似的任务,并启动PT,以加速培训和改进业绩。此外,为了探索哪些因素会影响PT的跨任务的可调用性,我们调查如何测量迅速调用率,发现激活神经神经元的重叠率与可转移性高度相关。在跨模式转让中,我们探索如何预测PLM的快速调用率,并成功培训一种能够实现类似任务的非三调用性调用性的投影仪。然而,以预测的快速调用速度来改进我们的知识的移动性,而其速度则不十分灵活地展示。