As training deep learning models on large dataset takes a lot of time and resources, it is desired to construct a small synthetic dataset with which we can train deep learning models sufficiently. There are recent works that have explored solutions on condensing image datasets through complex bi-level optimization. For instance, dataset condensation (DC) matches network gradients w.r.t. large-real data and small-synthetic data, where the network weights are optimized for multiple steps at each outer iteration. However, existing approaches have their inherent limitations: (1) they are not directly applicable to graphs where the data is discrete; and (2) the condensation process is computationally expensive due to the involved nested optimization. To bridge the gap, we investigate efficient dataset condensation tailored for graph datasets where we model the discrete graph structure as a probabilistic model. We further propose a one-step gradient matching scheme, which performs gradient matching for only one single step without training the network weights. Our theoretical analysis shows this strategy can generate synthetic graphs that lead to lower classification loss on real graphs. Extensive experiments on various graph datasets demonstrate the effectiveness and efficiency of the proposed method. In particular, we are able to reduce the dataset size by 90% while approximating up to 98% of the original performance and our method is significantly faster than multi-step gradient matching (e.g. 15x in CIFAR10 for synthesizing 500 graphs). Code is available at \url{https://github.com/amazon-research/DosCond}.
翻译:在大型数据集培训深层次学习模型需要大量时间和资源,因此希望建立一个小型合成数据集,我们可以据此对深层次学习模型进行足够培训。最近的一些工作探索了通过复杂的双层优化浓缩图像数据集的解决方案。例如,数据元凝固(DC)匹配网络梯度 w.r.t. 大型实时数据和小型合成数据,其中网络加权为每个外部迭代的多个步骤优化。然而,现有的方法有其固有的局限性:(1) 它们不直接适用于数据离散的图表;(2) 由于所涉的嵌入优化,凝固过程计算成本极高。为了缩小差距,我们调查了为图形数据集定制的高效数据堆凝固(DC),我们在这里将离散的图形结构作为概率模型。我们进一步提议了一个一阶梯比梯度匹配方案,它仅为每一步,而无需训练网络加权。我们的理论分析显示这一策略可以产生导致更低分类的合成图表,而导致在真实图表中大幅分类损失。 Dodredencation scoal scoal sqreal sqolations the ex ex a exal developmental degress.