Privacy is important considering the financial Domain as such data is highly confidential and sensitive. Natural Language Processing (NLP) techniques can be applied for text classification and entity detection purposes in financial domains such as customer feedback sentiment analysis, invoice entity detection, categorisation of financial documents by type etc. Due to the sensitive nature of such data, privacy measures need to be taken for handling and training large models with such data. In this work, we propose a contextualized transformer (BERT and RoBERTa) based text classification model integrated with privacy features such as Differential Privacy (DP) and Federated Learning (FL). We present how to privately train NLP models and desirable privacy-utility tradeoffs and evaluate them on the Financial Phrase Bank dataset.
翻译:鉴于这类数据属于高度机密和敏感的金融域,隐私十分重要。自然语言处理技术可用于金融领域的文本分类和实体检测目的,如客户反馈情绪分析、发票实体检测、金融文件按类型分类等。由于这些数据的敏感性质,需要采取隐私措施处理这类数据并培训使用这类数据的大型模型。在这项工作中,我们提出了一个基于背景的变压器(BERT和ROBERTA)文本分类模型,该模型结合了不同隐私(DP)和联邦学习(FL)等隐私特征。我们介绍了如何私下培训国家语言处理模型和可取的私隐利用权交易,并在金融时空银行数据集上对其进行评估。