Although large language models have achieved impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric language models, which augment a smaller language model with an external retriever, have demonstrated promising language modeling capabilities. However, it remains unclear whether such semi-parametric language models can perform competitively well as their fully-parametric counterparts on zero-shot generalization to downstream tasks. In this work, we introduce $\text{Zemi}$, a zero-shot semi-parametric language model. To our best knowledge, this is the first semi-parametric language model that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train $\text{Zemi}$ with a novel semi-parametric multitask prompted training paradigm, which shows significant improvement compared with the parametric multitask training as proposed by T0. Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus. In order to incorporate multiple potentially noisy retrieved augmentations, we further propose a novel $\text{augmentation fusion}$ module leveraging perceiver resampler and gated cross-attention. Notably, our proposed $\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation tasks while being 3.9x smaller in model size.
翻译:尽管大型语言模型已经达到了令人印象深刻的零射能力,但巨大的模型规模一般成本很高。最近,半参数语言模型,这些半参数语言模型用外部检索器来增强一个较小的语言模型,显示了有希望的语言模型能力。然而,尚不清楚这种半参数语言模型能否在零射全景一般化到下游任务方面具有竞争力和完全参数对等功能。在这项工作中,我们引入了一个零光半参数语言模型,即零光半参数语言模型。对于我们的最佳知识,这是第一个半参数语言模型,能够显示在一系列广泛的被搁置的隐蔽任务中具有很强的零射效果。我们用一个新的半参数多任务模型推动培训模式培训$\ text ⁇ emik $,这显示与T0建议的参数多功能培训相比,取得了显著的改进。具体地说,我们从一个大型任务模型和半参数的无标记语言模型中检索了零光速评价。为了纳入多个可能模糊的回收的半参数,我们进一步提议在T16exle{Qemia_Greal_QQQQRevalimal eximimal eximing iming iming iming exexeximing weal eximing weal $x ex