We present KPI-BERT, a system which employs novel methods of named entity recognition (NER) and relation extraction (RE) to extract and link key performance indicators (KPIs), e.g. "revenue" or "interest expenses", of companies from real-world German financial documents. Specifically, we introduce an end-to-end trainable architecture that is based on Bidirectional Encoder Representations from Transformers (BERT) combining a recurrent neural network (RNN) with conditional label masking to sequentially tag entities before it classifies their relations. Our model also introduces a learnable RNN-based pooling mechanism and incorporates domain expert knowledge by explicitly filtering impossible relations. We achieve a substantially higher prediction performance on a new practical dataset of German financial reports, outperforming several strong baselines including a competing state-of-the-art span-based entity tagging approach.
翻译:我们介绍了一个采用名称实体识别和关系提取等新方法的KPI-BERT系统,该系统利用名称实体识别和关系提取等新方法,从现实世界德国金融文件中提取公司的主要业绩指标(KPIs)和将其连接起来,例如“收入”或“利息支出”。具体地说,我们引入了一个端到端可培训架构,它以来自变异器的双向编码表示器(BERT)为基础,将一个经常性神经网络(RNN)与有条件的标签遮盖起来,在将其关系分类之前将其与顺序标签实体联系起来。我们的模型还引入了一个基于RNN的学习型集合机制,并通过明确过滤不可能建立的关系将领域专家知识纳入其中。我们在德国财务报告的一套新的实用数据集上取得了高得多的预测性业绩,超过了几个强的基线,其中包括一个相互竞争的基于光谱的实体标记方法。