To quickly obtain new labeled data, we can choose crowdsourcing as an alternative way at lower cost in a short time. But as an exchange, crowd annotations from non-experts may be of lower quality than those from experts. In this paper, we propose an approach to performing crowd annotation learning for Chinese Named Entity Recognition (NER) to make full use of the noisy sequence labels from multiple annotators. Inspired by adversarial learning, our approach uses a common Bi-LSTM and a private Bi-LSTM for representing annotator-generic and -specific information. The annotator-generic information is the common knowledge for entities easily mastered by the crowd. Finally, we build our Chinese NE tagger based on the LSTM-CRF model. In our experiments, we create two data sets for Chinese NER tasks from two domains. The experimental results show that our system achieves better scores than strong baseline systems.
翻译:为了迅速获得标签上的新数据,我们可以选择众包作为替代方法,在短期内成本较低。 但是,作为交换,来自非专家的人群说明的质量可能低于来自专家的人群说明。 在本文中,我们提出为中国命名实体识别(NER)进行人群说明学习的方法,以充分利用多批注员的噪音序列标签。在对抗性学习的启发下,我们的方法使用共同的Bi-LSTM和私人的Bi-LSTM来代表批注-generic 和 - 特定信息。批注-generic 信息是人群容易掌握的实体的常见知识。最后,我们根据LSTM-CRF模型构建了中国的NE tagger。在我们的实验中,我们从两个领域为中国的NER任务创建了两套数据集。实验结果显示,我们的系统比强大的基线系统得分要好。