Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$. Since standard neural net optimizers do not control normalized margin, it is hard to test whether this quantity causally relates to generalization. This paper designs a series of experimental studies that explicitly control normalized margin and thereby tackle two central questions. First: does normalized margin always have a causal effect on generalization? The paper finds that no -- networks can be produced where normalized margin has seemingly no relationship with generalization, counter to the theory of Bartlett et al. (2017). Second: does normalized margin ever have a causal effect on generalization? The paper finds that yes -- in a standard training setup, test performance closely tracks normalized margin. The paper suggests a Gaussian process model as a promising explanation for this behavior.
翻译:由于标准神经网优化器无法控制正常差值,因此很难检验这一数量是否与一般化有关。本文设计了一系列实验研究,明确控制正常差值,从而解决两个核心问题。首先:正常差值是否总是对一般化产生因果关系?论文认为,不能在正常差值似乎与一般化没有关系的情况下,建立网络,这与巴特利特等人(2017年)的理论相反。第二:正常差值是否对一般化有因果关系?论文认为是的,在标准训练设置中,测试性能密切跟踪正常差值。论文建议高斯进程模型作为这一行为的有希望的解释。