Prompt tuning has become a new paradigm for model tuning and it has demonstrated success in natural language pretraining and even vision pretraining. In this work, we explore the transfer of prompt tuning to multimodal pretraining, with a focus on generative multimodal pretrained models, instead of contrastive ones. Specifically, we implement prompt tuning on the unified sequence-to-sequence pretrained model adaptive to both understanding and generation tasks. Experimental results demonstrate that the light-weight prompt tuning can achieve comparable performance with finetuning and surpass other light-weight tuning methods. Besides, in comparison with finetuned models, the prompt-tuned models demonstrate improved robustness against adversarial attacks. We further figure out that experimental factors, including the prompt length, prompt depth, and reparameteratization, have great impacts on the model performance, and thus we empirically provide a recommendation for the setups of prompt tuning. Despite the observed advantages, we still find some limitations in prompt tuning, and we correspondingly point out the directions for future studies. Codes are available at \url{https://github.com/OFA-Sys/OFA}
翻译:快速调试已成为模式调试的新范式,在自然语言预培训甚至视力预培训方面取得了成功。在这项工作中,我们探索将快速调试转移到多式联运预培训,重点是基因化的多式联运预培训模式,而不是对比性模式。具体地说,我们对统一的顺序到顺序的预培训模式进行快速调试,以适应理解和生成任务。实验结果显示,轻量快速调能通过微调达到可比较的性能,超过其他轻调调方法。此外,与微调模型相比,快速调调模型显示,对对抗性攻击的强力有所增强。我们进一步发现,实验因素,包括即时长度、即时深度和再校准,对模型性能有重大影响,因此,我们从经验上为迅速调的组合提供了建议。尽管观测到一些好处,但我们仍然发现快速调调调方面的一些限制,我们相应地指出未来研究的方向。代码可在以下https://github.com/OFA-Sys/OFA}查阅。