Federated Learning aims to learn machine learning models from multiple decentralized edge devices (e.g. mobiles) or servers without sacrificing local data privacy. Recent Natural Language Processing techniques rely on deep learning and large pre-trained language models. However, both big deep neural and language models are trained with huge amounts of data which often lies on the server side. Since text data is widely originated from end users, in this work, we look into recent NLP models and techniques which use federated learning as the learning framework. Our survey discusses major challenges in federated natural language processing, including the algorithm challenges, system challenges as well as the privacy issues. We also provide a critical review of the existing Federated NLP evaluation methods and tools. Finally, we highlight the current research gaps and future directions.
翻译:联邦学习协会的目的是在不牺牲当地数据隐私的情况下,从多个分散的边缘设备(例如移动设备)或服务器学习机器学习模式,近期的自然语言处理技术依靠深层次的学习和大量预先培训的语言模型,然而,大型的深神经和语言模型都经过大量数据的培训,这些数据往往存在于服务器方面。由于文本数据广泛来自终端用户,我们在此工作中考察了最近使用联合学习作为学习框架的NLP模型和技术。我们的调查讨论了联合自然语言处理的重大挑战,包括算法挑战、系统挑战和隐私问题。我们还对现有联邦国家语言规划的评估方法和工具进行了严格审查。最后,我们强调了当前的研究差距和今后的方向。