Recently, AIOps (Artificial Intelligence for IT Operations) has been well studied in academia and industry to enable automated and effective software service management. Plenty of efforts have been dedicated to AIOps, including anomaly detection, root cause localization, incident management, etc. However, most existing works are evaluated on private datasets, so their generality and real performance cannot be guaranteed. The lack of public large-scale real-world datasets has prevented researchers and engineers from enhancing the development of AIOps. To tackle this dilemma, in this work, we introduce three public real-world, large-scale datasets about AIOps, mainly aiming at KPI anomaly detection, root cause localization on multi-dimensional data, and failure discovery and diagnosis. More importantly, we held three competitions in 2018/2019/2020 based on these datasets, attracting thousands of teams to participate. In the future, we will continue to publish more datasets and hold competitions to promote the development of AIOps further.
翻译:最近,在学术界和工业界对AIOps(信息技术业务的人工智能)进行了深入研究,以便能够进行自动化和有效的软件服务管理。已经为AIOps作出了大量努力,包括异常点检测、根本原因本地化、事件管理等。然而,大多数现有作品都是在私人数据集上评估的,因此无法保证其普遍性和真实性能。由于缺乏公共大规模真实世界数据集,研究人员和工程师无法加强AIOps的发展。为了解决这一难题,我们在这项工作中引入了三个关于AIOps的公开真实和大规模数据集,主要目的是发现KPI异常点,在多维数据上从根本上本地化,以及发现和诊断失败。更重要的是,我们在2018/2020年根据这些数据集举行了三次竞争,吸引了数千个团队参与。今后,我们将继续出版更多的数据集并举办竞赛,以促进AIOps的进一步发展。