We identify and prove a general principle: $L_1$ sparsity can be achieved using a redundant parametrization plus $L_2$ penalty. Our results lead to a simple algorithm, \textit{spred}, that seamlessly integrates $L_1$ regularization into any modern deep learning framework. Practically, we demonstrate (1) the efficiency of \textit{spred} in optimizing conventional tasks such as lasso and sparse coding, (2) benchmark our method for nonlinear feature selection of six gene selection tasks, and (3) illustrate the usage of the method for achieving structured and unstructured sparsity in deep learning in an end-to-end manner. Conceptually, our result bridges the gap in understanding the inductive bias of the redundant parametrization common in deep learning and conventional statistical learning.
翻译:我们发现并证明一项一般原则:用多余的超光化加上2美元罚款,就可以实现1美元宽度。我们的结果导致一个简单的算法,\ textit{spred},将1美元无缝地纳入任何现代深层次学习框架。实际上,我们展示了(1)\ textit{spred}在优化Lasso和稀疏编码等常规任务方面的效率,(2)为选择6个基因选择的非线性特征确定基准,(3) 说明在深层学习中以端到端的方式实现结构化和非结构化宽度的方法的使用情况。从概念上,我们的结果弥合了在理解深层学习和常规统计学习中常见的冗余的对应化的内在偏差。