The distributed Grid infrastructure for High Energy Physics experiments at the Large Hadron Collider (LHC) in Geneva comprises a set of computing centres, spread all over the world, as part of the Worldwide LHC Computing Grid (WLCG). In Italy, the Tier-1 functionalities are served by the INFN-CNAF data center, which provides also computing and storage resources to more than twenty non-LHC experiments. For this reason, a high amount of logs are collected each day from various sources, which are highly heterogeneous and difficult to harmonize. In this contribution, a working implementation of a system that collects, parses and displays the log information from CNAF data sources and the investigation of a Machine Learning based predictive maintenance system, is presented.
翻译:日内瓦大型高原对撞机(LHC)高能物理实验分布式网格基础设施由一套遍布世界各地的计算中心组成,作为世界范围内LHC计算网的一部分。在意大利,第一级功能由INFN-CNAF数据中心服务,该中心还为20多个非LHC试验提供计算和储存资源。为此,每天从各种来源收集大量日志,这些来源非常复杂,难以协调。在这一贡献中,介绍了一个系统的工作实施情况,该系统收集、提取和展示CNAF数据源的日志信息,并调查一个基于机器学习的预测维护系统。