Training language models to learn from human instructions for zero-shot cross-task generalization has attracted much attention in NLP communities. Recently, instruction tuning (IT), which fine-tunes a pre-trained language model on a massive collection of tasks described via human-craft instructions, has been shown effective in instruction learning for unseen tasks. However, IT relies on a large amount of human-annotated samples, which restricts its generalization. Unlike labeled data, unlabeled data are often massive and cheap to obtain. In this work, we study how IT can be improved with unlabeled data. We first empirically explore the IT performance trends versus the number of labeled data, instructions, and training tasks. We find it critical to enlarge the number of training instructions, and the instructions can be underutilized due to the scarcity of labeled data. Then, we propose Unlabeled Data Augmented Instruction Tuning (UDIT) to take better advantage of the instructions during IT by constructing pseudo-labeled data from unlabeled plain texts. We conduct extensive experiments to show UDIT's effectiveness in various scenarios of tasks and datasets. We also comprehensively analyze the key factors of UDIT to investigate how to better improve IT with unlabeled data. The code is publicly available at https://github.com/thu-coai/UDIT.
翻译:从人类指令中学习有关零点跨任务一般化的训练语言模型,已引起全国劳工政策网各界的极大关注。最近,在通过人类手工艺说明的大量任务收集的教学学习中,对培训前语言模型进行微调(IT),在通过人类手工艺说明的大量任务中,在教学学习过程中显示有效;然而,信息技术依赖大量人类附加说明的样本,这限制了其概括化。与标签数据不同,未贴标签的数据通常非常庞大,而且价格低廉。在这项工作中,我们研究如何用未贴标签的数据改进信息技术。我们首先从经验角度探索信息技术业绩趋势,而不是贴标签的数据、指示和培训任务的数量。我们发现,增加培训指示的数量至关重要,而且由于缺少标签数据,这些指示可能得不到充分利用。然后,我们提议使用无标签的数据强化说明图示(UDIT)来更好地利用信息技术期间的指示,从未贴标签的普通文本中制作假标签数据。我们进行了广泛的实验,以展示UDIT在各种任务和数据集中的有效性。我们还发现,扩大培训指示数量,因为我们发现由于缺少标签数据,因此无法对IT/UDIAT进行全面分析。我们如何在公开分析。