通过Zipf法律探索工业安全知识 (Exploring industrial safety knowledge via Zipf law)

The hazard and operability analysis (HAZOP) report contains precious industrial safety knowledge (ISK) with expert experience and process nature, which is of great significance to the development of industrial intelligence. Subject to the attributes of ISK, existing researches mine them through sequence labeling in deep learning. Yet, there are two thorny issues: (1) Uneven distribution of ISK and (2) Consistent importance of ISK: for safety review. In this study, we propose a novel generative mining strategy called CRGM to explore ISK. Inspired Zipf law in linguistics, CRGM consists of common-rare discriminator, induction-extension generator and ISK extractor. Firstly, the common-rare discriminator divides HAZOP descriptions into common words and rare words, and obtains the common description and the rare description, where the latter contains more industrial substances. Then, they are operated by the induction-extension generator in the way of deep text generation, the common description is induced and the rare description is extended, the material knowledge and the equipment knowledge can be enriched. Finally, the ISK extractor processes the material knowledge and equipment knowledge from the generated description through the rule template method, the additional ISK is regarded as the supplement of the training set to train the proposed sequence labeling model. We conduct multiple evaluation experiments on two industrial safety datasets. The results show that CRGM has promising and gratifying aptitudes, greatly improves the performance of the model, and is efficient and generalized. Our sequence labeling model also shows the expected performance, which is better than the existing research. Our research provides a new perspective for exploring ISK, we hope it can contribute support for the intelligent progress of industrial safety.

翻译：危险和可操作性分析(HAZOP)报告包含宝贵的工业安全知识(ISK),具有专家经验和工艺性质,对工业情报的发展具有重要意义。根据ISK的特性,现有研究通过深层学习的顺序标记来挖掘这些知识。然而,有两个棘手问题:(1) ISK分布不均,(2) ISK的一贯重要性:安全审查。在本研究中,我们提议了一个名为CRGM的新型基因化采矿战略,以探索ISK。在语言学中启发了Zipf法律,CRGM包括通用分析器、上岗扩展生成器和ISK提取器。首先,根据ISK的深度分析器将HAZOP的描述分为共同的文字和稀有的文字,并获得共同描述和稀有的描述。然后,由上岗延伸生成的发电机以深层文本生成的方式操作,共同描述,并扩展了稀有的描述,材料知识和设备知识可以丰富。最后,ISK的提取模型处理材料知识和设备序列,我们从SAZOP的预期性评估中获取了更多的材料知识和设备,而我们从SAS的进度的进度展示了我们所研订的进度,我们所研订的模型展示了我们所研订的进度,我们所研订的进度,我们所研订的动力的动力的进度是展示了一种方法,我们所研订的预的进度是用来展示的进度,我们所研订制的研订制的研订的研制的机的机的预的预的研制的研制的研制的模, 。我们所研制的研制的研制的研订制的研制的研制的研制的机的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制的研制