FedQAS:通过联合学习理解隐私意识机器阅读 (FedQAS: Privacy-aware machine reading comprehension with federated learning)

Machine reading comprehension (MRC) of text data is one important task in Natural Language Understanding. It is a complex NLP problem with a lot of ongoing research fueled by the release of the Stanford Question Answering Dataset (SQuAD) and Conversational Question Answering (CoQA). It is considered to be an effort to teach computers how to "understand" a text, and then to be able to answer questions about it using deep learning. However, until now large-scale training on private text data and knowledge sharing has been missing for this NLP task. Hence, we present FedQAS, a privacy-preserving machine reading system capable of leveraging large-scale private data without the need to pool those datasets in a central location. The proposed approach combines transformer models and federated learning technologies. The system is developed using the FEDn framework and deployed as a proof-of-concept alliance initiative. FedQAS is flexible, language-agnostic, and allows intuitive participation and execution of local model training. In addition, we present the architecture and implementation of the system, as well as provide a reference evaluation based on the SQUAD dataset, to showcase how it overcomes data privacy issues and enables knowledge sharing between alliance members in a Federated learning setting.

翻译：对文本数据的机器阅读理解(MRC)是自然语言理解(MRC)中的一项重要任务。这是一个复杂的NLP问题,许多正在进行的研究因斯坦福问题解答数据集(SQAD)和问答解答(CoQA)的发布而得到推动。它被视为一种努力,教计算机如何“了解”文本,然后能够用深层次的学习来回答关于它的问题。然而,直到目前为止,关于私人文本数据和知识共享的大规模培训在NLP任务中一直缺少。因此,我们介绍了一个隐私保护机器阅读系统FedQAS,该系统能够利用大型私人数据,而不必将这些数据集集中在一个中央地点。拟议的方法将变压器模型和联邦化学习技术结合起来。该系统是利用FEDn框架开发的,并作为一种有说服力的联盟倡议。FedQAS是灵活的,语言-无法理解,并允许直观地参与和实施当地模式培训。此外,我们介绍了该系统的结构和实施隐私保护机器阅读系统,能够利用大规模私人数据来利用这些数据集,而无需将这些数据集中到中央地点。拟议的方法将变换数据,使SAD联盟成员之间能够进行数据学习。

相关内容

联邦学习

关注 199

联邦学习（Federated Learning）是一种新兴的人工智能基础技术，在 2016 年由谷歌最先提出，原本用于解决安卓手机终端用户在本地更新模型的问题，其设计目标是在保障大数据交换时的信息安全、保护终端数据和个人数据隐私、保证合法合规的前提下，在多参与方或多计算结点之间开展高效率的机器学习。其中，联邦学习可使用的机器学习算法不局限于神经网络，还包括随机森林等重要算法。联邦学习有望成为下一代人工智能协同算法和协作网络的基础。

NLP必读经典文献100篇

专知会员服务

124+阅读 · 2020年9月8日

回顾机器学习公平的数学框架，Review of Mathematical frameworks for Fairness in Machine Learning

专知会员服务

38+阅读 · 2020年5月30日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日