The increasing privacy concerns on personal private text data promote the development of federated learning (FL) in recent years. However, the existing studies on applying FL in NLP are not suitable to coordinate participants with heterogeneous or private learning objectives. In this study, we further broaden the application scope of FL in NLP by proposing an Assign-Then-Contrast (denoted as ATC) framework, which enables clients with heterogeneous NLP tasks to construct an FL course and learn useful knowledge from each other. Specifically, the clients are suggested to first perform local training with the unified tasks assigned by the server rather than using their own learning objectives, which is called the Assign training stage. After that, in the Contrast training stage, clients train with different local learning objectives and exchange knowledge with other clients who contribute consistent and useful model updates. We conduct extensive experiments on six widely-used datasets covering both Natural Language Understanding (NLU) and Natural Language Generation (NLG) tasks, and the proposed ATC framework achieves significant improvements compared with various baseline methods. The source code is available at \url{https://github.com/alibaba/FederatedScope/tree/master/federatedscope/nlp/hetero_tasks}.
翻译:近些年来,对个人私人文本数据的隐私日益关注促进了联邦学习(FL)的发展,然而,关于将FL应用到NLP的现有研究并不适合于协调参与者的不同学习或私人学习目标;在本研究中,我们进一步扩展了FL在NLP的应用范围,提出指定-当时-Contrast(称为ATC)框架,使具有不同NLP任务的客户能够建立FLL课程和相互学习有用的知识;具体地说,建议客户首先进行当地培训,执行服务器指定的统一任务,而不是使用他们自己的学习目标,即指定培训阶段;在此之后,在对比培训阶段,客户以不同的当地学习目标进行培训,并与提供一致和有用的模式更新的其他客户交流知识;我们广泛试验了六个广泛使用的数据集,涵盖自然语言理解(NLU)和自然语言生成(NLG)的任务,拟议的ATC框架与各种基线方法相比取得了重大改进。