We study an issue commonly seen with graph data analysis: many real-world complex systems involving high-order interactions are best encoded by hypergraphs; however, their datasets often end up being published or studied only in the form of their projections (with dyadic edges). To understand this issue, we first establish a theoretical framework to characterize this issue's implications and worst-case scenarios. The analysis motivates our formulation of the new task, supervised hypergraph reconstruction: reconstructing a real-world hypergraph from its projected graph, with the help of some existing knowledge of the application domain. To reconstruct hypergraph data, we start by analyzing hyperedge distributions in the projection, based on which we create a framework containing two modules: (1) to handle the enormous search space of potential hyperedges, we design a sampling strategy with efficacy guarantees that significantly narrows the space to a smaller set of candidates; (2) to identify hyperedges from the candidates, we further design a hyperedge classifier in two well-working variants that capture structural features in the projection. Extensive experiments validate our claims, approach, and extensions. Remarkably, our approach outperforms all baselines by an order of magnitude in accuracy on hard datasets. Our code and data can be downloaded from bit.ly/SHyRe.
翻译:我们通过图表数据分析研究一个常见的问题:许多涉及高顺序互动的真实世界复杂系统最好由高压数据编码;然而,它们的数据集最终往往只能以预测的形式出版或研究(有三角边缘)。为了理解这一问题,我们首先建立一个理论框架来说明这一问题的影响和最坏的假设情况。分析促使我们制定新的任务,即监督的超光谱重建:在应用领域某些现有知识的帮助下,从预测图中重建一个真实世界的超光谱。为了重建超光谱数据,我们首先分析投影中的高端分布,我们在此基础上建立一个包含两个模块的框架:(1) 处理潜在的超镜的巨大搜索空间,我们设计一个抽样战略,其效力保证大大缩小空间,缩小候选人的范围;(2) 确定候选人的超高屏障,我们进一步设计两个精密的变异器,在预测中捕捉结构特征。广泛的实验验证了我们的主张、接近和扩展。可以说明,我们的方法比数据精确度超越了我们的数据的基线。