Financial named entity recognition (FinNER) from literature is a challenging task in the field of financial text information extraction, which aims to extract a large amount of financial knowledge from unstructured texts. It is widely accepted to use sequence tagging frameworks to implement FinNER tasks. However, such sequence tagging models cannot fully take advantage of the semantic information in the texts. Instead, we formulate the FinNER task as a machine reading comprehension (MRC) problem and propose a new model termed FinBERT-MRC. This formulation introduces significant prior information by utilizing well-designed queries, and extracts start index and end index of target entities without decoding modules such as conditional random fields (CRF). We conduct experiments on a publicly available Chinese financial dataset ChFinAnn and a real-word bussiness dataset AdminPunish. FinBERT-MRC model achieves average F1 scores of 92.78% and 96.80% on the two datasets, respectively, with average F1 gains +3.94% and +0.89% over some sequence tagging models including BiLSTM-CRF, BERT-Tagger, and BERT-CRF. The source code is available at https://github.com/zyz0000/FinBERT-MRC.
翻译:文献中的金融名称实体识别(FinNER)是金融文本信息提取领域一项具有挑战性的任务,目的是从无结构的文本中提取大量金融知识。人们广泛接受使用序列标记框架来实施芬纳勒任务。然而,这种序列标记模型不能充分利用文本中的语义信息。我们把芬纳勒任务作为一个机器阅读理解问题来制定,并提议一个新的模式,称为FinBERT-MRC。这一提法通过使用设计完善的查询来引入重要先前信息,并提取目标实体的启动指数和最终指数,而没有解码模块,如有条件随机字段(CRFRF)。我们实验的是公开提供的中国金融数据集ChFinAdminPunish 和实写公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用数据数据数据数据集数据集数据集数据集数据集数据集数据集数据集。FBER系统模型,在两个数据集中平均F1分得927中达到92.778%和96.96.80%和96.80%,在两个数据集上,两个数据集中平均F1增3.F1收益+3.F1+3./94%和94%和公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用公用,包括BERF1收益+3.BRF1+3.94%和+3.94%和+000%+3.94%和+MFBFBFBFBMFMBSBSBS/MRFBS/MRFBS/MRFBSBS/M/M/M/M/MRBS/M/MRBRBS/T/MRS/MRS/T/T/T/MRBS/T/MRS/T/MR