We introduce KPI-EDGAR, a novel dataset for Joint Named Entity Recognition and Relation Extraction building on financial reports uploaded to the Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system, where the main objective is to extract Key Performance Indicators (KPIs) from financial documents and link them to their numerical values and other attributes. We further provide four accompanying baselines for benchmarking potential future research. Additionally, we propose a new way of measuring the success of said extraction process by incorporating a word-level weighting scheme into the conventional F1 score to better model the inherently fuzzy borders of the entity pairs of a relation in this domain.
翻译:我们引入了KPI-EDGAR,这是在上载到电子数据收集、分析和检索系统(EDGAR)的财务报告基础上联合命名实体确认和采掘的一套新颖的数据集,其主要目的是从财务文件中提取关键业绩指标,并将其与数值和其他属性联系起来;我们进一步为今后可能的研究基准设定了四个相伴随的基准;此外,我们提出了一种新的衡量上述提取过程成功与否的方法,在常规的F1评分中纳入一个字级加权办法,以便更好地模拟与该领域关系实体对立的内在模糊边界。