中国命名实体识别中数据加密应用 (Application of Data Encryption in Chinese Named Entity Recognition)

Recently, with the continuous development of deep learning, the performance of named entity recognition tasks has been dramatically improved. However, the privacy and the confidentiality of data in some specific fields, such as biomedical and military, cause insufficient data to support the training of deep neural networks. In this paper, we propose an encryption learning framework to address the problems of data leakage and inconvenient disclosure of sensitive data in certain domains. We introduce multiple encryption algorithms to encrypt training data in the named entity recognition task for the first time. In other words, we train the deep neural network using the encrypted data. We conduct experiments on six Chinese datasets, three of which are constructed by ourselves. The experimental results show that the encryption method achieves satisfactory results. The performance of some models trained with encrypted data even exceeds the performance of the unencrypted method, which verifies the effectiveness of the introduced encryption method and solves the problem of data leakage to a certain extent.

翻译：最近,随着深层学习的不断发展,指定实体识别任务的履行情况大为改善,然而,生物医学和军事等特定领域数据的隐私和保密性导致数据不足,无法支持深神经网络的培训。在本文件中,我们提议了一个加密学习框架,以解决数据泄漏和某些领域敏感数据不便披露的问题。我们首次采用多种加密算法,在指定实体识别任务中加密培训数据。换句话说,我们利用加密数据对深神经网络进行培训。我们进行了六套中国数据集的实验,其中三套是由我们自己建造的。实验结果表明,加密方法取得了令人满意的结果。一些经过加密数据培训的模型的性能甚至超过了未加密方法的性能,该方法验证了引入加密方法的有效性,并在一定程度上解决了数据泄漏问题。