There has been an increased interest in applying machine learning techniques on relational structured-data based on an observed graph. Often, this graph is not fully representative of the true relationship amongst nodes. In these settings, building a generative model conditioned on the observed graph allows to take the graph uncertainty into account. Various existing techniques either rely on restrictive assumptions, fail to preserve topological properties within the samples or are prohibitively expensive for larger graphs. In this work, we introduce the node copying model for constructing a distribution over graphs. Sampling of a random graph is carried out by replacing each node's neighbors by those of a randomly sampled similar node. The sampled graphs preserve key characteristics of the graph structure without explicitly targeting them. Additionally, sampling from this model is extremely simple and scales linearly with the nodes. We show the usefulness of the copying model in three tasks. First, in node classification, a Bayesian formulation based on node copying achieves higher accuracy in sparse data settings. Second, we employ our proposed model to mitigate the effect of adversarial attacks on the graph topology. Last, incorporation of the model in a recommendation system setting improves recall over state-of-the-art methods.
翻译:对根据观察到的图表对关系结构化数据应用机器学习技术的兴趣日益浓厚。 通常, 这个图表并不完全代表节点之间的真实关系。 在这些环境中, 以所观测的图表为条件的基因模型可以将图的不确定性考虑在内。 各种现有技术要么依靠限制性的假设, 无法在样本中保存地形特性, 要么对较大的图表来说过于昂贵。 在这项工作中, 我们引入了用于构建图面上分布图的节点复制模型。 随机图表的取样是通过随机抽样的类似节点取代每个节点的邻居来进行的。 抽样图表保存了图表结构的关键特征, 但没有明确针对这些节点。 此外, 从这个模型中取样非常简单, 并且与节点一道以线度为线性标。 我们用三种任务来显示复制模型的有用性。 首先, 在节点分类中, 一种基于节点的贝伊斯式配方在稀有数据环境中达到更高的精确度。 其次, 我们使用我们提议的模型来减轻对图表顶点式攻击的影响。 最后, 将模型纳入一个建议系统。