We propose a new gradient-based approach for extracting sub-architectures from a given large model. Contrarily to existing pruning methods, which are unable to disentangle the network architecture and the corresponding weights, our architecture-pruning scheme produces transferable new structures that can be successfully retrained to solve different tasks. We focus on a transfer-learning setup where architectures can be trained on a large data set but very few data points are available for fine-tuning them on new tasks. We define a new gradient-based algorithm that trains architectures of arbitrarily low complexity independently from the attached weights. Given a search space defined by an existing large neural model, we reformulate the architecture search task as a complexity-penalized subset-selection problem and solve it through a two-temperature relaxation scheme. We provide theoretical convergence guarantees and validate the proposed transfer-learning strategy on real data.
翻译:我们提出了一个新的基于梯度的方法,从给定的大模型中提取子结构。与无法分解网络架构和相应重量的现有修剪方法相反,我们的建筑修剪计划产生了可转让的新结构,可以成功地再培训以解决不同任务。我们侧重于一个可就大型数据集对结构进行培训的转移学习设置,但可用于微调新任务的数据点很少。我们定义了一个新的基于梯度的算法,这种算法可以将任意的低复杂度结构与附加重量分开训练。鉴于一个现有的大型神经模型所定义的搜索空间,我们重新配置建筑搜索任务,将其作为一个复杂而实用的子选问题,并通过一个双温的放松计划加以解决。我们提供理论趋同保证,并验证关于真实数据的拟议转移学习战略。