Chinese NER is a difficult undertaking due to the ambiguity of Chinese characters and the absence of word boundaries. Previous work on Chinese NER focus on lexicon-based methods to introduce boundary information and reduce out-of-vocabulary (OOV) cases during prediction. However, it is expensive to obtain and dynamically maintain high-quality lexicons in specific domains, which motivates us to utilize more general knowledge resources, e.g., search engines. In this paper, we propose TURNER: The Uncertainty-based Retrieval framework for Chinese NER. The idea behind TURNER is to imitate human behavior: we frequently retrieve auxiliary knowledge as assistance when encountering an unknown or uncertain entity. To improve the efficiency and effectiveness of retrieval, we first propose two types of uncertainty sampling methods for selecting the most ambiguous entity-level uncertain components of the input text. Then, the Knowledge Fusion Model re-predict the uncertain samples by combining retrieved knowledge. Experiments on four benchmark datasets demonstrate TURNER's effectiveness. TURNER outperforms existing lexicon-based approaches and achieves the new SOTA.
翻译:中国净入学率是一项艰巨的任务,因为中国字符模糊不清,而且没有单词界限。以前关于中国净入学率的工作重点是在预测期间采用基于词汇的方法,以引入边界信息并减少校外(OOV)案例;然而,在具体领域获取并动态地保持高质量的分类法十分昂贵,这促使我们利用更多的一般知识资源,例如搜索引擎。在本文件中,我们提议TURNER:基于不确定性的中国净入学率框架。TURNER背后的想法是模仿人类行为:在遇到未知或不确定的实体时,我们经常检索辅助知识作为辅助工具。为了提高检索的效率和有效性,我们首先提出两种不确定的抽样方法,用于选择输入文本中最模糊的实体级不确定组成部分。然后,知识扩展模型通过将已检索的知识合并,对不确定的样本进行重新预测。对四个基准数据集的实验表明TURNER的有效性。 TURNER超越了现有的基于词汇的方法,并实现了新的SOTA。