Pre-trained language models have shown excellent results in few-shot learning scenarios using in-context learning. Although it is impressive, the size of language models can be prohibitive to make them usable in on-device applications, such as sensors or smartphones. With smaller language models, task-specific data annotation is needed to fine-tune the language model for a specific purpose. However, data annotation can have a substantial financial and time burden for small research groups, startups, and even companies. In this paper, we analyze different prompt-based fine-tuning techniques to improve results on both language and multimodal causal transformer models. To evaluate our results, we use a dataset focusing on visual commonsense reasoning in time. Our results show that by simple model-agnostic prompt-based fine-tuning, comparable results can be reached by only using 35%-40% of the fine-tuning training dataset. The proposed approaches result in significant time and financial savings. As the proposed methods make minimal architectural assumptions, other researchers can use the results in their transformer models with minimal adaptations. We plan to release the source code freely to make it easier for the community to use and contribute to our work.
翻译:培训前语言模型在利用全文学习的微小学习情景中表现出了优异的结果。虽然令人印象深刻,但语言模型的规模可能难以让它们用于传感器或智能手机等简易应用。使用较小的语言模型,需要针对特定任务的数据说明来为特定目的微调语言模型。然而,数据注释可能给小型研究团体、初创企业甚至公司带来巨大的财政和时间负担。在本文中,我们分析了不同的基于迅速的微调技术,以改善语言和多式联运因果变压机模型的结果。为了评估我们的结果,我们使用一套侧重于视觉常识推理的数据集来及时进行。我们的结果显示,通过简单的模型-意识快速微调,只能通过使用35%-40%的微调培训数据集来取得可比的结果。建议的方法可以节省大量时间和资金。由于拟议的方法使建筑假设变得很少,其他研究人员可以在其变异模型中使用这些结果,同时进行最低限度的调整。我们计划自由发布源代码,以便让社区更容易使用和推动我们的工作。