In temporal interaction networks, vertices correspond to entities, which exchange data quantities (e.g., money, bytes, messages) over time. Tracking the origin of data that have reached a given vertex at any time can help data analysts to understand the reasons behind the accumulated quantity at the vertex or behind the interactions between entities. In this paper, we study data provenance in a temporal interaction network. We investigate alternative propagation models that may apply to different application scenarios. For each such model, we propose annotation mechanisms that track the origin of propagated data in the network and the routes of data quantities. Besides analyzing the space and time complexity of these mechanisms, we propose techniques that reduce their cost in practice, by either (i) limiting provenance tracking to a subset of vertices or groups of vertices, or (ii) tracking provenance only for quantities that were generated in the near past or limiting the provenance data in each vertex by a budget constraint. Our experimental evaluation on five real datasets shows that quantity propagation models based on generation time or receipt order scale well on large graphs; on the other hand, a model that propagates quantities proportionally has high space and time requirements and can benefit from the aforementioned cost reduction techniques.
翻译:在时间互动网络中,顶点对应的是长期交换数据数量(如金钱、字节、电文等)的实体。跟踪数据源的源头,随时跟踪到达某一顶点的数据来源,可以帮助数据分析员了解在顶点或实体之间相互作用背后累积数量背后的原因。在本文中,我们在一个时间互动网络中研究数据来源;我们调查可能适用于不同应用情景的替代传播模型。我们为每个此类模型提出说明机制,跟踪网络中传播的数据来源和数据数量路径。除了分析这些机制的空间和时间复杂性外,我们还提出减少其实际成本的方法,即(一) 将源头跟踪限制在某个脊椎或脊椎组合中,或(二) 仅跟踪在近期生成的数量的源头,或因预算限制限制而限制每个顶点的源数据。我们对五个真实数据集的实验性评估表明,基于生成时间或接收顺序的数量传播模型可以在大图表中显示其时间和接收规模;在另一个方面,一个按比例传播数量的模式可以从时间上产生高成本和高额,从而可以按比例减少技术。