NLU systems deployed in the real world are expected to be regularly updated by retraining or finetuning the underlying neural network on new training examples accumulated over time. In our work, we focus on the multilingual setting where we would want to further finetune a multilingual model on new training data for the same NLU task on which the aforementioned model has already been trained for. We show that under certain conditions, naively updating the multilingual model can lead to losses in performance over a subset of languages although the aggregated performance metric shows an improvement. We establish this phenomenon over four tasks belonging to three task families (token-level, sentence-level and seq2seq) and find that the baseline is far from ideal for the setting at hand. We then build upon recent advances in parameter-efficient finetuning to develop novel finetuning pipelines that allow us to jointly minimize catastrophic forgetting while encouraging positive cross-lingual transfer, hence improving the spread of gains over different languages while reducing the losses incurred in this setup.
翻译:在一个现实世界中部署的NLU系统预计将通过再培训或根据长期积累的新培训范例对基本神经网络进行微调而定期更新。在我们的工作中,我们侧重于多语种环境,我们希望为上述模式已经培训过的同一NLU任务进一步微调关于新培训数据的多语种模式。我们表明,在某些条件下,天真地更新多语种模式可能会造成对一组语言的性能损失,尽管综合性能衡量标准显示情况有所改善。我们在属于三个任务组(即一级、判决一级和后续2等)的4项任务中确立了这一现象,并发现基线远非当前环境的理想。我们随后在节能微调方面发展新的微调管道,使我们能够共同尽量减少灾难性的遗忘,同时鼓励积极的跨语言转让,从而在减少这一设置中的损失的同时,改善不同语言的收益的传播。