Massively multilingual models are promising for transfer learning across tasks and languages. However, existing methods are unable to fully leverage training data when it is available in different task-language combinations. To exploit such heterogeneous supervision, we propose Hyper-X, a single hypernetwork that unifies multi-task and multilingual learning with efficient adaptation. This model generates weights for adapter modules conditioned on both tasks and language embeddings. By learning to combine task and language-specific knowledge, our model enables zero-shot transfer for unseen languages and task-language combinations. Our experiments on a diverse set of languages demonstrate that Hyper-X achieves the best or competitive gain when a mixture of multiple resources is available, while being on par with strong baselines in the standard scenario. Hyper-X is also considerably more efficient in terms of parameters and resources compared to methods that train separate adapters. Finally, Hyper-X consistently produces strong results in few-shot scenarios for new languages, showing the versatility of our approach beyond zero-shot transfer.
翻译:大规模多语种模式对不同任务和语言之间传授学习很有希望。然而,现有方法无法充分利用不同任务语言组合中的培训数据。为了利用这种多样性监督,我们提议采用超X,即一个单一的超网络,将多任务和多语言学习统一起来,并进行有效的适应。这一模式为以任务和语言嵌入为条件的适应模块带来权重。通过学习将任务和语言知识结合起来,我们的模式能够使隐性语言和任务语言组合零光传输。我们对多种语言组合的实验表明,在多种资源混合的情况下,超X能取得最佳或竞争性收益,同时在标准情景中与强基线相当。超X在参数和资源方面也比培训独立适应者的方法效率要高得多。最后,超X在对新语言的少见情景中不断产生强烈的结果,显示我们方法的多功能性超过了零点传输。