Occurrence reporting is a commonly used method in safety management systems to obtain insight in the prevalence of hazards and accident scenarios. In support of safety data analysis, reports are often categorized according to a taxonomy. However, the processing of the reports can require significant effort from safety analysts and a common problem is interrater variability in labeling processes. Also, in some cases, reports are not processed according to a taxonomy, or the taxonomy does not fully cover the contents of the documents. This paper explores various Natural Language Processing (NLP) methods to support the analysis of aviation safety occurrence reports. In particular, the problems studied are the automatic labeling of reports using a classification model, extracting the latent topics in a collection of texts using a topic model and the automatic generation of probable cause texts. Experimental results showed that (i) under the right conditions the labeling of occurrence reports can be effectively automated with a transformer-based classifier, (ii) topic modeling can be useful for finding the topics present in a collection of reports, and (iii) using a summarization model can be a promising direction for generating probable cause texts.
翻译:在安全管理系统中,经常使用报告方法,以了解危害和事故情况的普遍程度; 为支持安全数据分析,报告往往按分类分类分类; 然而,报告处理工作可能需要安全分析人员作出重大努力,一个共同的问题是标签过程中的跨周期变化; 在有些情况下,报告没有按照分类处理,或者分类没有完全覆盖文件的内容; 本文探讨各种自然语言处理方法,以支持对航空安全发生情况报告的分析; 特别是,所研究的问题是使用分类模型自动给报告贴标签,在利用专题模型收集文本时提取潜在专题,并自动生成可能的原因文本; 实验结果表明,(一) 在适当条件下,发生情况报告标签可以与变压器分类器有效自动化,(二) 专题建模有助于查找报告汇编中的专题,以及(三) 使用总结模型,可为产生可能的原因文本提供有希望的方向。