Fitting a model into GPU memory during training is an increasing concern as models continue to grow. Parameter sharing can reduce memory requirements, but existing methods only share parameters between identical layers, limiting their impact. This paper removes these restrictions with a novel task called Neural Parameter Allocation Search (NPAS), where the goal is to generate weights for a network using a given parameter budget. NPAS requires new techniques to morph available parameters to fit any architecture. To address this new task we introduce Shapeshifter Networks (SSNs), which automatically learns where and how to share parameters between all layers in a network, even between layers of varying sizes and operations. SSNs do not require any loss function or architecture modifications, making them easy to use. We evaluate SSNs in key NPAS settings using seven network architectures across diverse tasks including image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1% of the parameters.
翻译:随着模型的不断增长,在培训期间将模型应用到 GPU 记忆中越来越引起人们的关注。 参数共享可以减少记忆要求, 但现有方法只能在相同的层间共享参数, 限制其影响。 本文删除这些限制, 执行名为神经参数分配搜索( NPAS) 的新任务。 目标是利用给定参数预算为网络生成加权值。 NPAS 需要新技术来调整可用参数以适应任何架构。 为了应对这一新任务, 我们引入了形状转换器网络( SSNSNs ), 它会自动学习网络中所有层间共享参数的位置和方式, 甚至在不同大小和操作层之间。 SSNSN不需要任何损失函数或结构的修改, 使其容易使用。 我们用七种网络架构来评估关键NAPS 设置中的 SNS, 包括图像分类、 双向图像- 感应力检索和 定位等, 创建高性能模型, 即使只使用1%的参数, 也要使用少量使用。