Analysis of large observational data sets generated by a reactive system is a common challenge in debugging system failures and determining their root cause. One of the major problems is that these observational data suffer from survivorship bias. Examples include analyzing traffic logs from networks, and simulation logs from circuit design. In such applications, users want to detect non-spurious correlations from observational data and obtain actionable insights about them. In this paper, we introduce log to Neuro-symbolic (Log2NS), a framework that combines probabilistic analysis from machine learning (ML) techniques on observational data with certainties derived from symbolic reasoning on an underlying formal model. We apply the proposed framework to network traffic debugging by employing the following steps. To detect patterns in network logs, we first generate global embedding vector representations of entities such as IP addresses, ports, and applications. Next, we represent large log flow entries as clusters that make it easier for the user to visualize and detect interesting scenarios that will be further analyzed. To generalize these patterns, Log2NS provides an ability to query from static logs and correlation engines for positive instances, as well as formal reasoning for negative and unseen instances. By combining the strengths of deep learning and symbolic methods, Log2NS provides a very powerful reasoning and debugging tool for log-based data. Empirical evaluations on a real internal data set demonstrate the capabilities of Log2NS.
翻译:对反应系统产生的大型观测数据集的分析是调试系统故障和确定其根源原因的共同挑战。一个主要问题是这些观测数据存在生存偏差,例如分析网络的交通记录和电路设计模拟日志。在这些应用中,用户希望从观测数据中检测非纯的关联,并获得可操作的洞察力。在本文件中,我们引入了神经同步数据(Log2NS)的日志,这是一个将机器学习(ML)观测数据技术的概率分析与基于基本正式模型的象征性推理得出的某些参数相结合的框架。我们采用拟议的框架,通过采用以下步骤来进行网络交通调试。在网络日志中检测模式,我们首先生成IP地址、端点和应用等实体的全球嵌入矢表。接下来,我们将大型日志输入作为群集,使用户更容易直观和检测将进一步分析的有趣情景。为了真实的逻辑能力,Lolog2NS提供了从静止的日志和关联引擎中进行查询的能力,通过采用以下步骤进行网络调试探测。我们首先生成一个可靠的模型,并正式推理出一个精确的精确的逻辑。