A lot of theoretical and empirical evidence shows that the flatter local minima tend to improve generalization. Adversarial Weight Perturbation (AWP) is an emerging technique to efficiently and effectively find such minima. In AWP we minimize the loss w.r.t. a bounded worst-case perturbation of the model parameters thereby favoring local minima with a small loss in a neighborhood around them. The benefits of AWP, and more generally the connections between flatness and generalization, have been extensively studied for i.i.d. data such as images. In this paper, we extensively study this phenomenon for graph data. Along the way, we first derive a generalization bound for non-i.i.d. node classification tasks. Then we identify a vanishing-gradient issue with all existing formulations of AWP and we propose a new Weighted Truncated AWP (WT-AWP) to alleviate this issue. We show that regularizing graph neural networks with WT-AWP consistently improves both natural and robust generalization across many different graph learning tasks and models.
翻译:大量的理论和实验证据表明,受宠若惊的当地迷你动物往往会改进一般化。 反重力扰动( AWP)是高效和有效找到这种迷你动物的一种新兴技术。 在 AWP中,我们尽可能减少模型参数的损失( w.r.t. ), 从而有利于当地迷你动物,在其周围的某个社区造成少量损失。 AWP的好处,以及更一般而言的扁平和一般化之间的联系,已经为一.d. 数据,例如图像,进行了广泛的研究。在本文中,我们广泛研究这个现象,以图解数据。我们首先为非i.i.d. 节点分类任务得出一个概括化。然后我们找出一个与AWP所有现有配方的逐渐消失的问题,并提议一个新的经过重力调整的AWP(WT-AWP) 来缓解这个问题。 我们表明,与WT-AWP(W-AWP) 等成正统的图形神经网络不断改善许多不同图表学习任务和模型的自然和稳健健的概括化。