Much work has been done recently to make neural networks more interpretable, and one obvious approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or $\ell_1$-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we introduce LassoNet, a neural network framework with global feature selection. Our approach enforces a hierarchy: specifically a feature can participate in a hidden unit only if its linear representative is active. Unlike other approaches to feature selection for neural nets, our method uses a modified objective function with constraints, and so integrates feature selection with the parameter learning directly. As a result, it delivers an entire regularization path of solutions with a range of feature sparsity. On systematic experiments, LassoNet significantly outperforms state-of-the-art methods for feature selection and regression. The LassoNet method uses projected proximal gradient descent, and generalizes directly to deep networks. It can be implemented by adding just a few lines of code to a standard neural network.
翻译:最近已经做了许多工作,以使神经网络更易于解释,一个显而易见的方法是安排网络只使用可用功能的一组。在线性模型中,Lasso(或$\ell_1$-正规化)回归将零权重分配给最无关或最冗余的特性,并被广泛用于数据科学。然而,Lasso只适用于线性模型。在这里,我们引入了具有全球特征选择的神经网络框架LassoNet。我们的方法执行了一个等级:一个特性只有在线性代表活跃的情况下才能具体参加隐藏的单元。与其他用于神经网特征选择的方法不同,我们的方法使用一个有限制的修改目标功能,从而将特征选择与参数直接学习结合起来。结果,它提供了具有一系列特征宽度的整条解决方案的正规化路径。在系统实验中,LassoNet明显超越了特征选择和回归的最先进的方法。LassoNet方法使用预测的直线梯度梯度血统和直达深度网络。它可以通过向标准神经网络添加几行代码来实施。