The need of predictive maintenance comes with an increasing number of incidents reported by monitoring systems and equipment/software users. In the front line, on-call engineers (OCEs) have to quickly assess the degree of severity of an incident and decide which service to contact for corrective actions. To automate these decisions, several predictive models have been proposed, but the most efficient models are opaque (say, black box), strongly limiting their adoption. In this paper, we propose an efficient black box model based on 170K incidents reported to our company over the last 7 years and emphasize on the need of automating triage when incidents are massively reported on thousands of servers running our product, an ERP. Recent developments in eXplainable Artificial Intelligence (XAI) help in providing global explanations to the model, but also, and most importantly, with local explanations for each model prediction/outcome. Sadly, providing a human with an explanation for each outcome is not conceivable when dealing with an important number of daily predictions. To address this problem, we propose an original data-mining method rooted in Subgroup Discovery, a pattern mining technique with the natural ability to group objects that share similar explanations of their black box predictions and provide a description for each group. We evaluate this approach and present our preliminary results which give us good hope towards an effective OCE's adoption. We believe that this approach provides a new way to address the problem of model agnostic outcome explanation.
翻译:需要预测性维护是因为监测系统和设备/软件用户报告的事件越来越多。在前线,待命工程师(OCEs)必须迅速评估事件的严重程度,并决定为纠正行动提供何种服务。为使这些决定自动化,提出了若干预测模型,但效率最高的模型不透明(例如,黑匣子),严重限制其采用。在本文中,我们提议了一个基于过去7年来向公司报告的170K事件的有效黑盒模型,并强调在大量报告运行我们产品的数千个服务器的事件时需要自动分类。在可移植人工智能(XAI)中,最近的动态有助于向模型提供全球解释,但也最重要的是,每个模型预测/结果都有当地解释。可悲的是,在处理大量日常预测时,提供对每一种结果作出解释的人是无法想象的。为了解决这个问题,我们建议了一种原始的数据挖掘方法,其根植于Glob Discoy,一种模式采矿技术,其自然能力为我们每个组合对象提供了一种类似的预测结果初步解释,而我们则提供了一种初步解释。