LAnoBERT : 基于 BERT 掩码语言模型的系统日志异常检测 (LAnoBERT : System Log Anomaly Detection based on BERT Masked Language Model)

The system log generated in a computer system refers to large-scale data that are collected simultaneously and used as the basic data for determining simple errors and detecting external adversarial intrusion or the abnormal behaviors of insiders. The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention, which is a critical problem in the industry. Previous studies performed anomaly detection through algorithms after converting various forms of log data into a standardized template using a parser. These methods involved generating a template for refining the log key. Particularly, a template corresponding to a specific event should be defined in advance for all the log data using which the information within the log key may get lost.In this study, we propose LAnoBERT, a parser free system log anomaly detection method that uses the BERT model, exhibiting excellent natural language processing performance. The proposed method, LAnoBERT, learns the model through masked language modeling, which is a BERT-based pre-training method, and proceeds with unsupervised learning-based anomaly detection using the masked language modeling loss function per log key word during the inference process. LAnoBERT achieved better performance compared to previous methodology in an experiment conducted using benchmark log datasets, HDFS, and BGL, and also compared to certain supervised learning-based models.

翻译：计算机系统中生成的系统日志是指同时收集并用作确定简单错误和发现外部对抗入侵或内幕者异常行为的基本数据的大比例数据。系统日志异常现象探测的目的是迅速识别异常现象,同时尽量减少人为干预,这是该行业的一个关键问题。以前的研究在将不同形式的日志数据转换成使用剖析器的标准化模板后,通过算法发现了异常现象。这些方法包括制作一个用于改进日志键的模板。特别是,应事先为记录键中的信息可能丢失的所有日志数据定义一个与具体事件对应的模板。在本研究中,我们提议使用一种无源系统日志异常现象探测方法,即使用BERT模型,展示良好的自然语言处理性能。拟议的方法LAnoBERT通过隐蔽语言模型学习模型,这是一种基于BERT的预培训方法,并且通过使用隐蔽语言模型对每个对日志关键字进行损失的模拟功能,我们建议LAnoBERTRETERT, 将业绩与先前的学习模型进行比较。LDF。在使用前一种数据测试中,LS进行了更好的测试。