Few-shot transfer often shows substantial gain over zero-shot transfer~\cite{lauscher2020zero}, which is a practically useful trade-off between fully supervised and unsupervised learning approaches for multilingual pretrained model-based systems. This paper explores various strategies for selecting data for annotation that can result in a better few-shot transfer. The proposed approaches rely on multiple measures such as data entropy using $n$-gram language model, predictive entropy, and gradient embedding. We propose a loss embedding method for sequence labeling tasks, which induces diversity and uncertainty sampling similar to gradient embedding. The proposed data selection strategies are evaluated and compared for POS tagging, NER, and NLI tasks for up to 20 languages. Our experiments show that the gradient and loss embedding-based strategies consistently outperform random data selection baselines, with gains varying with the initial performance of the zero-shot transfer. Furthermore, the proposed method shows similar trends in improvement even when the model is fine-tuned using a lower proportion of the original task-specific labeled training data for zero-shot transfer.
翻译:微小的传输往往显示,在零点传输(cite{lauscher2020zero})上,收益大得多,这是对多语种预先培训的模型系统进行完全监督和不受监督的学习方法之间的一个实际有益的权衡。本文探讨了选择数据进行批注的各种战略,以便产生更好的微量传输。拟议方法依靠多种措施,如使用美元克语言模型的数据英吉利(ngrophy)、预测的英吉利和梯度嵌入。我们提议了序列标签任务损失嵌入方法,以产生与梯度嵌入类似的多样性和不确定性取样。拟议的数据选择战略经过了评估,并比较了POS标记、NER和NLI最多20种语言的数据选择任务。我们的实验显示,基于梯度和损失的战略始终超越了随机数据选择基线,其收益与零点传输的初步表现不同。此外,拟议方法显示类似的改进趋势,即使模型使用较低比例的原始任务专用分类培训数据进行微调。