We propose to minimize a generic differentiable loss function with $L_1$ penalty with a redundant reparametrization and straightforward stochastic gradient descent. Our proposal is the direct generalization of a series of previous ideas that the $L_1$ penalty may be equivalent to a differentiable reparametrization with weight decay. We prove that the proposed method, \textit{spred}, is an exact solver of $L_1$ and that the reparametrization trick is completely ``benign" for a generic nonconvex function. Practically, we demonstrate the usefulness of the method in (1) training sparse neural networks to perform gene selection tasks, which involves finding relevant features in a very high dimensional space, and (2) neural network compression task, to which previous attempts at applying the $L_1$-penalty have been unsuccessful. Conceptually, our result bridges the gap between the sparsity in deep learning and conventional statistical learning.
翻译:我们建议尽量减少一种通用的可变损失功能,以1美元罚款,并配有多余的重新平衡和直截了当的梯度下降。我们的建议是直接概括以前的一系列想法,即1美元罚款可能相当于与重量衰减的可变再平衡。我们证明拟议的方法, \ textit{spred} 是1美元的一个精确的解答器, 并且对一个通用的非convx函数来说, 重新平衡的伎俩是完全的“ benign ” 。 实际上, 我们展示了以下方法的实用性:(1) 培训稀有的神经网络来执行基因选择任务, 其中包括在非常高的维度空间找到相关特征,(2) 神经网络压缩任务, 此前试图应用1美元元元的计算方法没有成功。 从概念上讲, 我们的结果弥合了深度学习和常规统计学习之间的鸿沟。</s>