System logs are a common source of monitoring data for analyzing computing systems' behavior. Due to the complexity of modern computing systems and the large size of collected monitoring data, automated analysis mechanisms are required. Numerous machine learning and deep learning methods are proposed to address this challenge. However, due to the existence of sensitive data in system logs their analysis and storage raise serious privacy concerns. Anonymization methods could be used to clean the monitoring data before analysis. However, anonymized system logs, in general, do not provide adequate usefulness for the majority of behavioral analysis. Content-aware anonymization mechanisms such as PaRS preserve the correlation of system logs even after anonymization. This work evaluates the usefulness of anonymized system logs taken from the Taurus HPC cluster anonymized using PaRS, for behavioral analysis via recurrent neural network models.
翻译:系统日志是分析计算系统行为的一个常见的监测数据来源。由于现代计算机系统的复杂性和所收集的监测数据规模庞大,需要自动分析机制。提议了多种机器学习和深层学习方法来应对这一挑战。然而,由于系统日志中存在敏感数据,因此它们的分析和储存引起了严重的隐私问题。在分析前,可以使用匿名方法来清理监测数据。然而,匿名系统日志一般不为大多数行为分析提供足够的用处。内容识别匿名机制,如PARS,即使在匿名后也能保持系统日志的相互关系。这项工作评估了从Torus HPC群中采集的匿名系统日志的有用性,这些日志使用PARS,通过经常性神经网络模型进行行为分析。