The parameters of a neural network are naturally organized in groups, some of which might not contribute to its overall performance. To prune out unimportant groups of parameters, we can include some non-differentiable penalty to the objective function, and minimize it using proximal gradient methods. In this paper, we derive the weighted proximal operator, which is a necessary component of these proximal methods, of two structured sparsity inducing penalties. Moreover, they can be approximated efficiently with a numerical solver, and despite this approximation, we prove that existing convergence guarantees are preserved when these operators are integrated as part of a generic adaptive proximal method. Finally, we show that this adaptive method, together with the weighted proximal operators derived here, is indeed capable of finding solutions with structure in their sparsity patterns, on representative examples from computer vision and natural language processing.
翻译:神经网络的参数自然地按组排列, 有些参数可能无助于其总体性能。 为了排除一些无关紧要的参数组, 我们可以对目标函数包括一些不可区分的处罚, 并使用近似梯度方法将它最小化。 在本文中, 我们从两种结构化的宽度方法中得出加权精度操作器, 这是这些精度方法的必要组成部分。 此外, 它们可以与数字求解器相近, 尽管这种近似, 我们证明当这些操作器被纳入通用的适应性准度方法时, 现有的趋同保证得到了维护。 最后, 我们表明, 这种适应性方法, 加上从这里得出的加权精度操作器, 确实能够找到结构化的解决方案, 其结构以计算机视觉和自然语言处理的有代表性的例子 。