Kernel traces are sequences of low-level events comprising a name and multiple arguments, including a timestamp, a process id, and a return value, depending on the event. Their analysis helps uncover intrusions, identify bugs, and find latency causes. However, their effectiveness is hindered by omitting the event arguments. To remedy this limitation, we introduce a general approach to learning a representation of the event names along with their arguments using both embedding and encoding. The proposed method is readily applicable to most neural networks and is task-agnostic. The benefit is quantified by conducting an ablation study on three groups of arguments: call-related, process-related, and time-related. Experiments were conducted on a novel web request dataset and validated on a second dataset collected on pre-production servers by Ciena, our partnering company. By leveraging additional information, we were able to increase the performance of two widely-used neural networks, an LSTM and a Transformer, by up to 11.3% on two unsupervised language modelling tasks. Such tasks may be used to detect anomalies, pre-train neural networks to improve their performance, and extract a contextual representation of the events.
翻译:内核痕迹是低层次事件的序列, 包括一个名称和多个参数, 包括时间戳、 进程标识和返回值, 取决于事件。 它们的分析有助于发现入侵、 识别错误和找到潜伏原因。 但是, 它们的效力因省略事件参数而受阻 。 为了纠正这一限制, 我们引入了一种一般方法, 使用嵌入和编码两种方法来学习事件名称及其参数的表示方式。 提议的方法很容易适用于大多数神经网络, 并且是任务机密的。 其效益通过对三组参数( 调用相关、 进程相关和时间相关)进行反动研究加以量化。 实验是在一个新网络请求数据集上进行的, 并在我们伙伴公司Ciena在生产前服务器上收集的第二个数据集上得到验证 。 通过利用额外信息, 我们得以提高两个广泛使用的神经网络( LSTM 和变换器) 的性能, 在两个未超固的语言建模任务上达到11.3% 。 此类任务可用于检测异常性、 之前神经网络 改进性能和背景图象。