Many systems generate data as a set of triplets (a, b, c): they may represent that user a called b at time c or that customer a purchased product b in store c. These datasets are traditionally studied as networks with an extra dimension (time or layer), for which the fields of temporal and multiplex networks have extended graph theory to account for the new dimension. However, such frameworks detach one variable from the others and allow to extend one same concept in many ways, making it hard to capture patterns across all dimensions and to identify the best definitions for a given dataset. This extended abstract overrides this vision and proposes a direct processing of the set of triplets. In particular, our work shows that a more general analysis is possible by partitioning the data and building categorical propositions that encode informative patterns. We show that several concepts from graph theory can be framed under this formalism and leverage such insights to extend the concepts to data triplets. Lastly, we propose an algorithm to list propositions satisfying specific constraints and apply it to a real world dataset.
翻译:许多系统生成了一组三重数据(a、b、c):它们可能代表用户在时间(c)或客户购买产品(c)。这些数据集传统上作为具有额外维度(时间或层)的网络进行研究,时间和多层网络领域扩大了图形理论的范围,以顾及新的维度。然而,这些框架从其他框架中分离出一个变量,以多种方式扩展一个相同的概念,使得很难在所有维度中捕捉模式并确定给定数据集的最佳定义。这种扩展的抽象取代了这一愿景,并提议直接处理三重数据集。特别是,我们的工作表明,通过对数据进行分割和构建可编码信息模式的绝对参数,可以进行更一般性的分析。我们表明,图形理论中的若干概念可以在这种形式主义下制定,并利用这种洞察力将概念扩大到数据三重。最后,我们提出一种算法,以列出满足特定制约的主张,并将其应用到真实的世界数据集中。