Since the advent of Federated Learning (FL), research has applied these methods to natural language processing (NLP) tasks. Despite a plethora of papers in FL for NLP, no previous works have studied how multilingual text impacts FL algorithms. Furthermore, multilingual text provides an interesting avenue to examine the impact of non-IID text (e.g. different languages) on FL in naturally occurring data. We explore three multilingual language tasks, language modeling, machine translation, and text classification using differing federated and non-federated learning algorithms. Our results show that using pretrained models reduces the negative effects of FL, helping them to perform near or better than centralized (no privacy) learning, even when using non-IID partitioning.
翻译:自联邦学习(FL)以来,研究将这些方法应用于自然语言处理(NLP)任务。尽管在FL为NLP撰写了大量论文,但以前没有研究过多语言文本如何影响FL算法。此外,多语言文本为研究非IID文本(例如不同语言)对FL自然数据的影响提供了一个有趣的途径。我们探索了三种多语言任务:语言建模、机器翻译和文本分类,使用不同的联邦和非联邦学习算法。我们的结果显示,使用预先培训的模型可以减少FL的负面影响,帮助他们进行接近或比集中(没有隐私)学习更好的学习,即使使用非IID分割法。