Artificial Intelligence for IT Operations (AIOps) has been adopted in organizations in various tasks, including interpreting models to identify indicators of service failures. To avoid misleading practitioners, AIOps model interpretations should be consistent (i.e., different AIOps models on the same task agree with one another on feature importance). However, many AIOps studies violate established practices in the machine learning community when deriving interpretations, such as interpreting models with suboptimal performance, though the impact of such violations on the interpretation consistency has not been studied. In this paper, we investigate the consistency of AIOps model interpretation along three dimensions: internal consistency, external consistency, and time consistency. We conduct a case study on two AIOps tasks: predicting Google cluster job failures, and Backblaze hard drive failures. We find that the randomness from learners, hyperparameter tuning, and data sampling should be controlled to generate consistent interpretations. AIOps models with AUCs greater than 0.75 yield more consistent interpretation compared to low-performing models. Finally, AIOps models that are constructed with the Sliding Window or Full History approaches have the most consistent interpretation with the trends presented in the entire datasets. Our study provides valuable guidelines for practitioners to derive consistent AIOps model interpretation.
翻译:各组织在各种任务中采用了信息技术业务的人工智能(AIOPs),包括解释模型以确定服务失败指标。为了避免误导从业者,AIOPs模型解释应该一致(即同一任务的不同AIOPs模型在特点重要性上相互一致)。然而,许多AIOPs研究在解释时违反了机器学习界的既定做法,例如解释模型时表现不尽人意,尽管没有研究这种违反对解释一致性的影响。在本文件中,我们从三个方面调查AIOPs模型解释的一致性:内部一致性、外部一致性和时间一致性。我们对AIOPs的两项任务进行案例研究:预测Google集群的工作失败,和回压硬盘驱动失败。我们发现,应控制学生的随机性、超参数调整和数据取样,以得出一致的解释。AUCs型模型超过0.75,其解释与低效模型相比更为一致。最后,与Sliping Windorld 或Full Hestimical 方法一起构建的AIOS模型,为我们提出的整个数据解释提供了最有价值的实践者提供了最一致的解释。