In image generation, generative models can be evaluated naturally by visually inspecting model outputs. However, this is not always the case for graph generative models (GGMs), making their evaluation challenging. Currently, the standard process for evaluating GGMs suffers from three critical limitations: i) it does not produce a single score which makes model selection challenging, ii) in many cases it fails to consider underlying edge and node features, and iii) it is prohibitively slow to perform. In this work, we mitigate these issues by searching for scalar, domain-agnostic, and scalable metrics for evaluating and ranking GGMs. To this end, we study existing GGM metrics and neural-network-based metrics emerging from generative models of images that use embeddings extracted from a task-specific network. Motivated by the power of certain Graph Neural Networks (GNNs) to extract meaningful graph representations without any training, we introduce several metrics based on the features extracted by an untrained random GNN. We design experiments to thoroughly test metrics on their ability to measure the diversity and fidelity of generated graphs, as well as their sample and computational efficiency. Depending on the quantity of samples, we recommend one of two random-GNN-based metrics that we show to be more expressive than pre-existing metrics. While we focus on applying these metrics to GGM evaluation, in practice this enables the ability to easily compute the dissimilarity between any two sets of graphs regardless of domain. Our code is released at: https://github.com/uoguelph-mlrg/GGM-metrics.
翻译:在图像生成中,基因模型可以自然地通过视觉检查模型输出结果来评估。然而,对于图形基因模型(GGMS)来说,这并不总是如此,因此其评价具有挑战性。目前,评估GGMs的标准过程有三大限制:(1) 它没有产生单一的评分,使模型选择具有挑战性;(2) 在许多情况下,它没有考虑到基本的边缘和节点特征;(3) 它的运行速度太慢,令人望而却步。 在这项工作中,我们通过寻找标尺、域名和可缩放的衡量尺度来缓解这些问题,以便评估和排序GGGGMs。为此,我们研究现有的GGGGM指标和基于神经网络的计量标准存在三个关键限制:(1) 它不会产生一个单一的模型, 在某些图形神经网络(GNNPs)的力量下, 无需经过任何培训, 我们根据未经训练的随机随机GNNNG的特征, 引入了几个指标。 我们设计了测试指标, 测试它们测量GGGG的多样化和准确性的能力。 我们用两个模型来测量生成的精确度的能力。