$L_{p}$-norm regularization schemes such as $L_{0}$, $L_{1}$, and $L_{2}$-norm regularization and $L_{p}$-norm-based regularization techniques such as weight decay and group LASSO compute a quantity which depends on model weights considered in isolation from one another. This paper describes a novel regularizer which is not based on an $L_{p}$-norm. In contrast with $L_{p}$-norm-based regularization, this regularizer is concerned with the spatial arrangement of weights within a weight matrix. This regularizer is an additive term for the loss function and is differentiable, simple and fast to compute, scale-invariant, requires a trivial amount of additional memory, and can easily be parallelized. Empirically this method yields approximately a one order-of-magnitude improvement in the number of nonzero model parameters at a given level of accuracy.
翻译:$L_{p}$-范数正则化方案,如$L_{0}$,$L_{1}$和$L_{2}$-范数正则化,以及基于$L_{p}$-范数的正则化技术,如权重衰减和分组LASSO计算一个仅依赖于单独考虑的模型权重的量。本文描述了一种不基于$L_{p}$-范数的新型正则化器。与$L_{p}$-范数正则化相比,这个正则化器关注的是权重矩阵内权重的空间排列。这个正则化器是一个添加到损失函数的可微分项,计算快速,简单,同时具有尺度不变性,需要微不足道的额外内存,可以容易地并行化。实验上,该方法在相同精度水平下,可以将非零模型参数数量提高一个数量级。