The analysis of logs is a vital activity undertaken for fault or cyber incident detection, investigation and technical forensics analysis for system and cyber resilience. The potential application of AI algorithms for Log analysis could augment such complex and laborious tasks. However, such solution has its constraints the heterogeneity of log sources and limited to no labels for training a classifier. When such labels become available, the need for the classifier to be updated. This practice-based research seeks to address these challenges with the use of Transformer construct to train a new model with only normal log entries. Log augmentation through multiple forms of perturbation is applied as a form of self-supervised training for feature learning. The model is further finetuned using a form of reinforcement learning with a limited set of label samples to mimic real-world situation with the availability of labels. The experimental results of our model construct show promise with comparative evaluation measurements paving the way for future practical applications.
翻译:日志分析是进行故障或网络安全事件检测、技术取证分析的重要活动,能够增强系统和网络安全性。人工智能算法在日志分析中的潜在应用可以增强这种复杂和繁重的任务。然而,这样的解决方案有其限制:日志来源的异构性以及缺乏用于训练分类器的标签。当此类标签可用时,需要更新分类器。本基于实践的研究通过使用Transformer构建来训练一个仅使用正常日志条目的新模型来解决这些挑战。通过多种形式的扰动进行日志扩充,作为一种自监督训练形式用于特征学习。使用有限的一组标签样本运行一种强化学习方法进一步微调模型,以模拟现实世界中的情况,即有标签可用时的模型改进。我们模型构建的实验结果显示出很高的潜力,比较性评估的指标为未来的实际应用铺平了道路。