Provenance is a record that describes how entities, activities, and agents have influenced a piece of data; it is commonly represented as graphs with relevant labels on both their nodes and edges. With the growing adoption of provenance in a wide range of application domains, users are increasingly confronted with an abundance of graph data, which may prove challenging to process. Graph kernels, on the other hand, have been successfully used to efficiently analyse graphs. In this paper, we introduce a novel graph kernel called provenance kernel, which is inspired by and tailored for provenance data. It decomposes a provenance graph into tree-patterns rooted at a given node and considers the labels of edges and nodes up to a certain distance from the root. We employ provenance kernels to classify provenance graphs from three application domains. Our evaluation shows that they perform well in terms of classification accuracy and yield competitive results when compared against existing graph kernel methods and the provenance network analytics method while more efficient in computing time. Moreover, the provenance types used by provenance kernels also help improve the explainability of predictive models built on them.
翻译:验证是一种记录,它描述了实体、活动和代理人如何影响一个数据;它通常被作为图表,在节点和边缘都有相关标签。随着在广泛的应用领域越来越多地采用源代码,用户越来越多地面对大量图表数据,而这些数据可能证明难以处理。另一方面,图表内核被成功地用于高效分析图表。在本文件中,我们引入了一个新的图形内核,称为源内核,它受源内核的启发,并专门为源内内内核定制。它将原始图分解为植根于特定节点的树型图,并且考虑边缘和节点的标签,直至根的一定距离。我们使用源内核对三个应用领域的来源图进行分类。我们的评估表明,与现有的图形内核方法和来源网络的解析方法相比,它们在分类准确性方面表现良好,并产生竞争性结果,同时更高效地计算时间。此外,源内核模型使用的来源类型也有助于改进预测。