Massively multilingual models are promising for transfer learning across tasks and languages. However, existing methods are unable to fully leverage training data when it is available in different task-language combinations. To exploit such heterogeneous supervision we propose Hyper-X, a unified hypernetwork that generates weights for parameter-efficient adapter modules conditioned on both tasks and language embeddings. By learning to combine task and language-specific knowledge our model enables zero-shot transfer for unseen languages and task-language combinations. Our experiments on a diverse set of languages demonstrate that Hyper-X achieves the best gain when a mixture of multiple resources is available while performing on par with strong baselines in the standard scenario. Finally, Hyper-X consistently produces strong results in few-shot scenarios for new languages and tasks showing the effectiveness of our approach beyond zero-shot transfer.
翻译:大规模多语种模式对于传授不同任务和语言的学习很有希望。但是,现有方法无法充分利用不同任务语言组合中的培训数据。为了利用这种多样化的监督,我们提议采用超X,即一个统一的超网络,为以任务和语言嵌入为条件的具有参数效率的适应器模块产生权重。通过学习将任务和语言知识结合起来,我们的模式能够零发地传输看不见的语言和任务语言组合。我们对多种语言组合的实验表明,如果多种资源混合在一起,同时在标准情景中与强有力的基线同步运行,超X将取得最佳收益。最后,超X在少数情况下对新语言和任务产生显著效果,表明我们方法的效力超过零发传输。