As training deep learning models on large dataset takes a lot of time and resources, it is desired to construct a small synthetic dataset with which we can train deep learning models sufficiently. There are recent works that have explored solutions on condensing image datasets through complex bi-level optimization. For instance, dataset condensation (DC) matches network gradients w.r.t. large-real data and small-synthetic data, where the network weights are optimized for multiple steps at each outer iteration. However, existing approaches have their inherent limitations: (1) they are not directly applicable to graphs where the data is discrete; and (2) the condensation process is computationally expensive due to the involved nested optimization. To bridge the gap, we investigate efficient dataset condensation tailored for graph datasets where we model the discrete graph structure as a probabilistic model. We further propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. Extensive experiments on various graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance and our method is significantly faster than multi-step gradient matching (e.g. 15x in CIFAR10 for synthesizing 500 graphs).
翻译:由于在大型数据集上培训深层次学习模型需要大量时间和资源,因此希望建立一个小型合成数据集,以便我们能够充分培训深层次学习模型。最近的一些工作探索了通过复杂的双层优化浓缩图像数据集的解决方案。例如,数据元凝固(DC)匹配网络梯度 w.r.t.大现实数据和小型合成数据,其中网络重量为每个外部迭代的多个步骤优化。然而,现有方法有其内在的局限性:(1) 它们不直接适用于数据离散的图表;(2) 由于相关的嵌入优化,凝固过程计算成本极高。为了缩小差距,我们调查了为图形数据集定制的高效数据堆凝固(DC),我们在那里将离散的图形结构作为概率模型,作为概率模型。我们进一步提议了一个一站式梯度匹配方案,在不训练网络加权的情况下,只对一个单步进行梯度匹配。我们的理论分析显示,这个策略可以产生导致更低分类损失的合成图表,在实际图表中, 15 精确的精确度是我们提出的数据序列方法。在实际图表中, 大规模地实验将数据缩缩缩缩缩缩成。