Pretrained multilingual encoders enable zero-shot cross-lingual transfer, but often produce unreliable models that exhibit high performance variance on the target language. We postulate that this high variance results from zero-shot cross-lingual transfer solving an under-specified optimization problem. We show that any linear-interpolated model between the source language monolingual model and source + target bilingual model has equally low source language generalization error, yet the target language generalization error reduces smoothly and linearly as we move from the monolingual to bilingual model, suggesting that the model struggles to identify good solutions for both source and target languages using the source language alone. Additionally, we show that zero-shot solution lies in non-flat region of target language error generalization surface, causing the high variance.
翻译:训练有素的多语种编码器能够实现零点跨语种转让,但往往产生不可靠的模型,显示目标语言的性能差异很大。我们假设,这种高差异是由于零点跨语种转让解决了特定优化不足的问题。我们表明,源语言单一语言模式和源+目标双语模式之间的线性间交流模式同样存在源语言通用错误,但目标语言通用错误随着我们从单语模式转向双语模式而平稳和线性地减少,这表明该模型很难找到仅使用源语言的源语言和目标语言的好解决方案。此外,我们表明零点解决方案存在于目标语言错误通用化的非膨胀区域,造成高度差异。