Sequences of group interactions, such as emails, online discussions, and co-authorships, are ubiquitous; and they are naturally represented as a stream of hyperedges. Despite their broad potential applications, anomaly detection in hypergraphs (i.e., sets of hyperedges) has received surprisingly little attention, compared to that in graphs. While it is tempting to reduce hypergraphs to graphs and apply existing graph-based methods, according to our experiments, taking higher-order structures of hypergraphs into consideration is worthwhile. We propose HashNWalk, an incremental algorithm that detects anomalies in a stream of hyperedges. It maintains and updates a constant-size summary of the structural and temporal information about the stream. Using the summary, which is the form of a proximity matrix, HashNWalk measures the anomalousness of each new hyperedge as it appears. HashNWalk is (a) Fast: it processes each hyperedge in near real-time and billions of hyperedges within a few hours, (b) Space Efficient: the size of the maintained summary is a predefined constant, (c) Effective: it successfully detects anomalous hyperedges in real-world hypergraphs.
翻译:集团互动的序列,如电子邮件、在线讨论和共同作者,无处不在;它们自然地被作为高科技流。尽管它们具有广泛的潜在应用,但高原(即高科技组)中异常现象的探测却与图表中的情况相比很少引起人们的注意。虽然根据我们的实验,它吸引着将高原降为图表并应用以图表为基础的现有方法,但将高压结构纳入考虑是值得的。我们提议了HashNavalk,一种测出高科技流中异常现象的递增算法。它维持并更新了关于该流的结构和时间信息的固定规模摘要。使用摘要(即近距离矩阵的形式),HashNavalk测量了每个新的高科技的异常性。HashNazalk(a)是快速的:它处理近实时和数十亿高科技的每高科技结构,在几个小时内处理一次高科技的高级算法,(b)空间节能:保持的精确摘要的大小是预定的高度恒定的恒定值(c)。