扩展基于字典压缩算法用于日志文件的模式量化可视化 (Extension of Dictionary-Based Compression Algorithms for the Quantitative Visualization of Patterns from Log Files) - 专知论文

会员服务 ·

0

压缩算法 · 算法 · 频繁模式 · 算法扩展 · 小文件 ·

2023 年 4 月 10 日

Extension of Dictionary-Based Compression Algorithms for the Quantitative Visualization of Patterns from Log Files

翻译：扩展基于字典压缩算法用于日志文件的模式量化可视化

Igor Cherepanov,Jonathan Geraldi Joewono,Arjan Kuijper,Jörn Kohlhammer

from arxiv, submitted to EuroVA 2023

Many services today massively and continuously produce log files of different and varying formats. These logs are important since they contain information about the application activities, which is necessary for improvements by analyzing the behavior and maintaining the security and stability of the system. It is a common practice to store log files in a compressed form to reduce the sheer size of these files. A compression algorithm identifies frequent patterns in a log file to remove redundant information. This work presents an approach to detect frequent patterns in textual data that can be simultaneously registered during the file compression process with low consumption of resources. The log file can be visualized with the possibility to explore the extracted patterns using metrics based on such properties as frequency, length and root prefixes of the acquired pattern. This allows an analyst to gain the relevant insights more efficiently reducing the need for manual labor-intensive inspection in the log data. The extension of the implemented dictionary-based compression algorithm has the advantage of recognizing patterns in log files of any format and eliminates the need to manually perform preparation for any preprocessing of log files.

翻译：许多服务当前会大量持续地产生不同格式的日志文件。这些日志文件很重要，因为它们包含应用程序活动的信息，必要时可以通过分析行为来改善系统并维护系统的安全和稳定性。将日志文件以压缩方式存储以减小文件大小是一种常见做法。压缩算法识别日志文件中频繁的模式来移除冗余的信息。本工作提出了一种在文件压缩过程中检测文本数据频繁模式的方法，该方法消耗的资源较低，可以同时记录。通过基于频率、长度和获取的模式根前缀等特性的度量，可以可视化日志文件并探索提取的模式。这使得分析员可以更有效地获得相关洞察力，减少在日志数据中进行手动的繁重检查的需求。所实现的基于字典的压缩算法扩展具有识别任何格式日志文件模式的优势，并且消除了手动为任何预处理日志文件执行准备工作的需要。

0

相关内容

压缩算法

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

TensorFlow 2.2为keras.Model加入train_step方法，开发者可自由定义模型自动训练过程

TensorFlow 2.2为keras.Model加入train_step方法，开发者可自由定义模型自动训练过程

专知会员服务

36+阅读 · 2020年3月27日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【ECML-PKDD 2019】多维时间序列和事件日志的模式挖掘和异常检测框架（A framework for pattern mining and anomalydetection in multi-dimensional time series andevent logs）

【ECML-PKDD 2019】多维时间序列和事件日志的模式挖掘和异常检测框架（A framework for pattern mining and anomalydetection in multi-dimensional time series andevent logs）

专知会员服务

38+阅读 · 2019年12月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

Kubernetes 1.24发布，支持网络策略状态、上下文日志记录和子资源

Kubernetes 1.24发布，支持网络策略状态、上下文日志记录和子资源

InfoQ

0+阅读 · 2022年11月9日

JDK11 的 11 个谜题：Hanno Embregts 在 Devoxx UK 阐述对 Java 认证的理解

JDK11 的 11 个谜题：Hanno Embregts 在 Devoxx UK 阐述对 Java 认证的理解

InfoQ

0+阅读 · 2022年5月31日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

分享经典信息的量子秘密共享研究

国家自然科学基金

0+阅读 · 2013年12月31日

Markov决策过程值函数逼近的基函数自动构造

国家自然科学基金

1+阅读 · 2012年12月31日

面向移动互联网的软件测试改进方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向个性化推荐的地理信息可视化方法

国家自然科学基金

4+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

通信系统中并行多信道ARQ协议的随机模型及其性能分析

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

元数据驱动的企业数据模型验证与管理研究

国家自然科学基金

2+阅读 · 2009年12月31日

基于NDVI的流域产流产沙对LUCC响应的快速预测及其尺度效应

国家自然科学基金

0+阅读 · 2009年12月31日

基于多版本技术的自适应编译优化方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

Large Language Models as Tool Makers

Large Language Models as Tool Makers

Arxiv

1+阅读 · 2023年5月26日

Random-Access Neural Compression of Material Textures

Arxiv

0+阅读 · 2023年5月26日

Finite Time Regret Bounds for Minimum Variance Control of Autoregressive Systems with Exogenous Inputs

Arxiv

0+阅读 · 2023年5月26日

Koopman Kernel Regression

Arxiv

0+阅读 · 2023年5月25日

Abstractive Summary Generation for the Urdu Language

Arxiv

0+阅读 · 2023年5月25日

Do You Hear The People Sing? Key Point Analysis via Iterative Clustering and Abstractive Summarisation

Arxiv

0+阅读 · 2023年5月25日

Generative Adversarial Reduced Order Modelling

Arxiv

0+阅读 · 2023年5月25日

SAGA: Summarization-Guided Assert Statement Generation

Arxiv

0+阅读 · 2023年5月24日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Improved Image Segmentation via Cost Minimization of Multiple Hypotheses

Arxiv

14+阅读 · 2018年1月31日

VIP会员

文章信息

相关主题

相关VIP内容

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

【干货书】机器学习设计模式，408页pdf，Machine Learning Design Patterns

专知会员服务

138+阅读 · 2022年2月6日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

TensorFlow 2.2为keras.Model加入train_step方法，开发者可自由定义模型自动训练过程

TensorFlow 2.2为keras.Model加入train_step方法，开发者可自由定义模型自动训练过程

专知会员服务

36+阅读 · 2020年3月27日

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

【SIGMOD2020-CMU】在内存中搜索树的顺序保持键压缩，Order-Preserving Key Compression for In-Memory Search Trees

专知会员服务

15+阅读 · 2020年3月7日

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

【经典书】数据挖掘：理论、算法与示例，347页pdf，Nong Ye，Arizona State University

专知会员服务

82+阅读 · 2020年2月27日

【ECML-PKDD 2019】多维时间序列和事件日志的模式挖掘和异常检测框架（A framework for pattern mining and anomalydetection in multi-dimensional time series andevent logs）

【ECML-PKDD 2019】多维时间序列和事件日志的模式挖掘和异常检测框架（A framework for pattern mining and anomalydetection in multi-dimensional time series andevent logs）

专知会员服务

38+阅读 · 2019年12月1日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

【牛津博士论文】零样本强化学习综述

《美军条令：陆军指挥官与规划人员地理空间指南》60页

战术边缘指挥控制：防务面临的核心挑战

迈向开放世界检测：综述

相关资讯

Kubernetes 1.24发布，支持网络策略状态、上下文日志记录和子资源

Kubernetes 1.24发布，支持网络策略状态、上下文日志记录和子资源

InfoQ

0+阅读 · 2022年11月9日

JDK11 的 11 个谜题：Hanno Embregts 在 Devoxx UK 阐述对 Java 认证的理解

JDK11 的 11 个谜题：Hanno Embregts 在 Devoxx UK 阐述对 Java 认证的理解

InfoQ

0+阅读 · 2022年5月31日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

利用动态深度学习预测金融时间序列基于Python

利用动态深度学习预测金融时间序列基于Python

量化投资与机器学习

18+阅读 · 2018年10月30日

vae 相关论文表示学习 1

vae 相关论文表示学习 1

CreateAMind

12+阅读 · 2018年9月6日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】RNN/LSTM时序预测

【推荐】RNN/LSTM时序预测

机器学习研究会

25+阅读 · 2017年9月8日

【推荐】深度学习目标检测概览

【推荐】深度学习目标检测概览

机器学习研究会

10+阅读 · 2017年9月1日

【推荐】SVM实例教程

【推荐】SVM实例教程

机器学习研究会

17+阅读 · 2017年8月26日

相关论文

Large Language Models as Tool Makers

Large Language Models as Tool Makers

Arxiv

1+阅读 · 2023年5月26日

Random-Access Neural Compression of Material Textures

Arxiv

0+阅读 · 2023年5月26日

Finite Time Regret Bounds for Minimum Variance Control of Autoregressive Systems with Exogenous Inputs

Arxiv

0+阅读 · 2023年5月26日

Koopman Kernel Regression

Arxiv

0+阅读 · 2023年5月25日

Abstractive Summary Generation for the Urdu Language

Arxiv

0+阅读 · 2023年5月25日

Do You Hear The People Sing? Key Point Analysis via Iterative Clustering and Abstractive Summarisation

Arxiv

0+阅读 · 2023年5月25日

Generative Adversarial Reduced Order Modelling

Arxiv

0+阅读 · 2023年5月25日

SAGA: Summarization-Guided Assert Statement Generation

Arxiv

0+阅读 · 2023年5月24日

Class-Balanced Loss Based on Effective Number of Samples

Arxiv

12+阅读 · 2019年1月16日

Improved Image Segmentation via Cost Minimization of Multiple Hypotheses

Arxiv

14+阅读 · 2018年1月31日

相关基金

分享经典信息的量子秘密共享研究

国家自然科学基金

0+阅读 · 2013年12月31日

Markov决策过程值函数逼近的基函数自动构造

国家自然科学基金

1+阅读 · 2012年12月31日

面向移动互联网的软件测试改进方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向个性化推荐的地理信息可视化方法

国家自然科学基金

4+阅读 · 2012年12月31日

面向属性的CPN建模及On the Fly辅助的测试生成方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

通信系统中并行多信道ARQ协议的随机模型及其性能分析

国家自然科学基金

0+阅读 · 2011年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

元数据驱动的企业数据模型验证与管理研究

国家自然科学基金

2+阅读 · 2009年12月31日

基于NDVI的流域产流产沙对LUCC响应的快速预测及其尺度效应

国家自然科学基金

0+阅读 · 2009年12月31日

基于多版本技术的自适应编译优化方法研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员