The increasing complexity of IT systems requires solutions, that support operations in case of failure. Therefore, Artificial Intelligence for System Operations (AIOps) is a field of research that is becoming increasingly focused, both in academia and industry. One of the major issues of this area is the lack of access to adequately labeled data, which is majorly due to legal protection regulations or industrial confidentiality. Methods to mitigate this stir from the area of federated learning, whereby no direct access to training data is required. Original approaches utilize a central instance to perform the model synchronization by periodical aggregation of all model parameters. However, there are many scenarios where trained models cannot be published since its either confidential knowledge or training data could be reconstructed from them. Furthermore the central instance needs to be trusted and is a single point of failure. As a solution, we propose a fully decentralized approach, which allows to share knowledge between trained models. Neither original training data nor model parameters need to be transmitted. The concept relies on teacher and student roles that are assigned to the models, whereby students are trained on the output of their teachers via synthetically generated input data. We conduct a case study on log anomaly detection. The results show that an untrained student model, trained on the teachers output reaches comparable F1-scores as the teacher. In addition, we demonstrate that our method allows the synchronization of several models trained on different distinct training data subsets.
翻译:信息技术系统日益复杂,这就要求有解决办法,在出现失败时支持操作。因此,系统操作人工情报(AIOps)是一个研究领域,越来越集中在学术界和工业界。该领域的一个主要问题是缺乏充分贴标签的数据,这主要是由于法律保护条例或工业保密。从联邦学习领域缓解这种潮流的方法,即不需要直接获得培训数据。原始方法利用一个中心实例,通过定期汇总所有模型参数来进行模型同步。然而,有许多经过训练的模型无法公布,因为可以从这些模型中重建其机密知识或培训数据。此外,中心实例需要信任,并且是一个单一的失败点。作为一个解决办法,我们建议一种完全分散的办法,允许在经过训练的模型之间分享知识。不需要直接获取培训数据,也不需要传输模型的原始培训数据和模型参数。概念依赖于教师和学生的作用,通过合成生成的数据对学生进行关于教师产出的培训。我们进行了关于日志异常异常异常的检测的案例研究。结果显示,我们经过训练的教师们能够以不同的模型的形式进行不同的分析。