Machine reading comprehension (MRC) of text data is one important task in Natural Language Understanding. It is a complex NLP problem with a lot of ongoing research fueled by the release of the Stanford Question Answering Dataset (SQuAD) and Conversational Question Answering (CoQA). It is considered to be an effort to teach computers how to "understand" a text, and then to be able to answer questions about it using deep learning. However, until now large-scale training on private text data and knowledge sharing has been missing for this NLP task. Hence, we present FedQAS, a privacy-preserving machine reading system capable of leveraging large-scale private data without the need to pool those datasets in a central location. The proposed approach combines transformer models and federated learning technologies. The system is developed using the FEDn framework and deployed as a proof-of-concept alliance initiative. FedQAS is flexible, language-agnostic, and allows intuitive participation and execution of local model training. In addition, we present the architecture and implementation of the system, as well as provide a reference evaluation based on the SQUAD dataset, to showcase how it overcomes data privacy issues and enables knowledge sharing between alliance members in a Federated learning setting.
翻译:对文本数据的机器阅读理解(MRC)是自然语言理解(MRC)中的一项重要任务。这是一个复杂的NLP问题,许多正在进行的研究因斯坦福问题解答数据集(SQAD)和问答解答(CoQA)的发布而得到推动。它被视为一种努力,教计算机如何“了解”文本,然后能够用深层次的学习来回答关于它的问题。然而,直到目前为止,关于私人文本数据和知识共享的大规模培训在NLP任务中一直缺少。因此,我们介绍了一个隐私保护机器阅读系统FedQAS,该系统能够利用大型私人数据,而不必将这些数据集集中在一个中央地点。拟议的方法将变压器模型和联邦化学习技术结合起来。该系统是利用FEDn框架开发的,并作为一种有说服力的联盟倡议。FedQAS是灵活的,语言-无法理解,并允许直观地参与和实施当地模式培训。此外,我们介绍了该系统的结构和实施隐私保护机器阅读系统,能够利用大规模私人数据来利用这些数据集,而无需将这些数据集中到中央地点。拟议的方法将变换数据,使SAD联盟成员之间能够进行数据学习。