The field of Deep Learning is rich with empirical evidence of human-like performance on a variety of prediction tasks. However, despite these successes, the recent Predicting Generalization in Deep Learning (PGDL) NeurIPS 2020 competition suggests that there is a need for more robust and efficient measures of network generalization. In this work, we propose a new framework for evaluating the generalization capabilities of trained networks. We use perturbation response (PR) curves that capture the accuracy change of a given network as a function of varying levels of training sample perturbation. From these PR curves, we derive novel statistics that capture generalization capability. Specifically, we introduce two new measures for accurately predicting generalization gaps: the Gi-score and Pal-score, that are inspired by the Gini coefficient and Palma ratio (measures of income inequality), that accurately predict generalization gaps. Using our framework applied to intra and inter class sample mixup, we attain better predictive scores than the current state-of-the-art measures on a majority of tasks in the PGDL competition. In addition, we show that our framework and the proposed statistics can be used to capture to what extent a trained network is invariant to a given parametric input transformation, such as rotation or translation. Therefore, these generalization gap prediction statistics also provide a useful means for selecting the optimal network architectures and hyperparameters that are invariant to a certain perturbation.
翻译:尽管取得了这些成功,但最近预测深层学习(PGDL) NeurIPS 2020 年的竞争显示,需要更有力、更高效的网络概括化措施。在这项工作中,我们提出一个新的框架,用于评价受过培训的网络的通用能力。我们使用扰动反应曲线,将特定网络的准确性变化作为不同培训样本扰动的功能。我们从这些PR曲线中获取新的统计数据,以捕捉概括化能力。具体地说,我们引入了两种准确预测总体化差距的新措施:基数和Pal-Score,这是受基尼系数和帕尔马比率(收入不平等度衡量)的启发,可以准确预测总体化差距。我们使用我们适用于班级内部和班级之间抽样混杂的框架,我们比目前对PGDL 竞争中大多数任务采取的最新措施,我们展示了更好的预测分数。此外,我们展示了我们的框架和拟议中的核心值,即核心值和核心值,这两类统计数据可以用来对总体的预测结构进行什么程度的精确化。