Financial institutions manage operational risk by carrying out the activities required by regulation, such as collecting loss data, calculating capital requirements, and reporting. The information necessary for this purpose is then collected in the OpRisk databases. Recorded for each OpRisk event are loss amounts, dates, organizational units involved, event types and descriptions. In recent years, operational risk functions have been required to go beyond their regulatory tasks to proactively manage the operational risk, preventing or mitigating its impact. As OpRisk databases also contain event descriptions, usually defined as free text fields, an area of opportunity is the valorization of all the information contained in such records. As far as we are aware of, the present work is the first one that has addressed the application of text analysis techniques to the OpRisk event descriptions. In this way, we have complemented and enriched the established framework of statistical methods based on quantitative data. Specifically, we have applied text analysis methodologies to extract information from descriptions in the OpRisk database. After delicate tasks like data cleaning, text vectorization, and semantic adjustment, we apply methods of dimensionality reduction and several clustering models and algorithms to develop a comparison of their performances and weaknesses. Our results improve retrospective knowledge of loss events and enable to mitigate future risks.
翻译:由于OpRisk数据库也包含事件描述,通常定义为免费文本字段,因此,一个机会领域是此类记录中所有信息的保值。据我们所知,目前的工作是第一个将文字分析技术应用于OpRisk事件描述的工作。我们以此方式补充和丰富了基于定量数据的既定统计方法框架。具体地说,我们应用了文本分析方法从OpRisk数据库的描述中提取信息。在完成了诸如数据清理、文字传导化和语义调整等微妙任务之后,我们采用了减少维度的方法和若干组合模型和算法,以便改进我们未来的弱点和损失情况。