As part of its digitization initiative, the German Central Bank (Deutsche Bundesbank) wants to examine the extent to which natural Language Processing (NLP) can be used to make independent decisions upon the eligibility criteria of securities prospectuses. Every month, the Directorate General Markets at the German Central Bank receives hundreds of scanned prospectuses in PDF format, which must be manually processed to decide upon their eligibility. We found that this tedious and time-consuming process can be (semi-)automated by employing modern NLP model architectures, which learn the linguistic feature representation in text to identify the present eligible and ineligible criteria. The proposed Decision Support System provides decisions of document-level eligibility criteria accompanied by human-understandable explanations of the decisions. The aim of this project is to model the described use case and to evaluate the extent to which current research results from the field of NLP can be applied to this problem. After creating a heterogeneous domain-specific dataset containing annotations of eligible and non-eligible mentions of relevant criteria, we were able to successfully build, train and deploy a semi-automatic decider model. This model is based on transformer-based language models and decision trees, which integrate the established rule-based parts of the decision processes. Results suggest that it is possible to efficiently model the problem and automate decision making to more than 90% for many of the considered eligibility criteria.
翻译:作为数字化倡议的一部分,德国中央银行(Deutsche Bundesbank)希望审查自然语言处理(NLP)在多大程度上可以使用自然语言处理(NLP)来就证券期货的资格标准作出独立决定。德国中央银行总市场司每月以PDF格式收到数百份扫描前景文件,必须人工处理,才能决定其资格。我们发现,这个繁琐和耗时的过程可以通过使用现代NLP模型结构实现(半)自动化,这些模型在文本中学习语言特征说明,以确定目前的合格和不符合资格标准。拟议的决定支持系统提供文件级别资格标准的决定,同时对决定作出人无法理解的解释。这个项目的目的是模拟所述使用资格案例,评价目前NLP领域研究成果在多大程度上可以适用于这一问题。在创建了包含合格和不合格的相关标准说明的混杂的域特有数据集之后,我们得以成功地建立、培训和部署一个半自动决定模型。这个模型以文件级资格标准为基础,并附有对决定作出不易理解的解释。这个模型的基础是,是将决定型号转化为可能决定的模型,而不是以汽车为基础的模型。