We show generalisation error bounds for deep learning with two main improvements over the state of the art. (1) Our bounds have no explicit dependence on the number of classes except for logarithmic factors. This holds even when formulating the bounds in terms of the $L^2$-norm of the weight matrices, where previous bounds exhibit at least a square-root dependence on the number of classes. (2) We adapt the classic Rademacher analysis of DNNs to incorporate weight sharing -- a task of fundamental theoretical importance which was previously attempted only under very restrictive assumptions. In our results, each convolutional filter contributes only once to the bound, regardless of how many times it is applied. Further improvements exploiting pooling and sparse connections are provided. The presented bounds scale as the norms of the parameter matrices, rather than the number of parameters. In particular, contrary to bounds based on parameter counting, they are asymptotically tight (up to log factors) when the weights approach initialisation, making them suitable as a basic ingredient in bounds sensitive to the optimisation procedure. We also show how to adapt the recent technique of loss function augmentation to our situation to replace spectral norms by empirical analogues whilst maintaining the advantages of our approach.
翻译:我们展示了用于深层次学习的概括性错误,对艺术状态有两个主要改进:(1) 我们的界限除了对数因素外,对班级数量没有明显的依赖性,即使按照重量矩阵的值以$L$2$-norm值来制定界限,以前的约束至少显示对类数的平方依赖。(2) 我们调整了典型的Rademacher对DNN的典型分析,以纳入权重共享 -- -- 这是一项具有根本的理论重要性的任务,以前只是在非常严格的假设下才尝试过这项工作。在我们的结果中,每个龙卷风过滤器只对约束性贡献过一次,而不管它应用了多少次。提供了利用集合和稀少连接的进一步改进。作为参数矩阵规范而不是参数数目的显示界限比例表。特别是,与基于参数的界限相反,当权重着手初始化时,它们几乎是紧张的(与日志因素),因此它们成为敏感于节制优化程序的基本成分。我们还展示了如何调整我们最近的损失计算方法,通过光谱化程序来保持我们的实验性优势,从而取代光谱性状况。