As graph data becomes more ubiquitous, the need for robust inferential graph algorithms to operate in these complex data domains is crucial. In many cases of interest, inference is further complicated by the presence of adversarial data contamination. The effect of the adversary is frequently to change the data distribution in ways that negatively affect statistical and algorithmic performance. We study this phenomenon in the context of vertex nomination, a semi-supervised information retrieval task for network data. Here, a common suite of methods relies on spectral graph embeddings, which have been shown to provide both good algorithmic performance and flexible settings in which regularization techniques can be implemented to help mitigate the effect of an adversary. Many current regularization methods rely on direct network trimming to effectively excise the adversarial contamination, although this direct trimming often gives rise to complicated dependency structures in the resulting graph. We propose a new trimming method that operates in model space which can address both block structure contamination and white noise contamination (contamination whose distribution is unknown). This model trimming is more amenable to theoretical analysis while also demonstrating superior performance in a number of simulations, compared to direct trimming.
 翻译:由于图表数据越来越普遍,因此在这些复杂的数据领域使用稳健的推论式图表算法的必要性至关重要。在许多令人关注的情况下,由于存在对抗性数据污染,推论就更加复杂。对手的作用往往是以对统计和算法性表现产生不利影响的方式改变数据分布。我们从顶端提名的角度研究这种现象,这是网络数据的半监督信息检索任务。这里,一套共同的方法依靠光谱图嵌入,显示它既提供良好的算法性能,又提供灵活的环境,在这种环境中可以采用正规化技术来帮助减轻对手的影响。许多目前的正规化方法依靠直接网络的三角组合来有效消化对抗性污染,尽管这种直接的三角组合往往在结果图中产生复杂的依赖结构。我们建议一种新的三角组合方法,在模型空间运行,既能处理块结构污染,又能处理白色噪音污染(分布不明的污染)。这种模型三模法更适合理论分析,同时显示在模拟中比直接三模。