Fitting a model into GPU memory during training is an increasing concern as models continue to grow. To address this issue, we present Shapeshifter Networks (SSNs), a flexible neural network framework that decouples layers from model weights, enabling us to implement any neural network with an arbitrary number of parameters. In SSNs each layer obtains weights from a parameter store that decides where and how to allocate parameters to layers. This can result in sharing parameters across layers even when they have different sizes or perform different operations. SSNs do not require any modifications to a model's loss function or architecture, making them easy to use. Our approach can create parameter efficient networks by using a relatively small number of weights, or can improve a model's performance by adding additional model capacity during training without affecting the computational resources required at test time. We evaluate SSNs using seven network architectures across diverse tasks that include image classification, bidirectional image-sentence retrieval, and phrase grounding, creating high performing models even when using as little as 1% of the parameters.
翻译:随着模型的不断增长,将模型应用于培训期间的 GPU 记忆是一个日益令人关切的问题。为了解决这个问题,我们提出了形状变换者网络(SSNS),这是一个灵活的神经网络框架,它能将层与模型重量脱钩,使我们能够执行任何带有任意参数数的神经网络。在SNS中,每个层从一个参数存储处获得加权,该存储处决定参数如何分配到各个层次。这可能导致各层间共享参数,即使它们大小不同或执行不同的操作。SSNSN不需要修改模型的损失函数或结构,使其容易使用。我们的方法可以通过使用相对少量的重量来创建参数高效的网络,或者通过在培训期间增加模型能力来改进模型的性能,而不影响测试时所需的计算资源。我们用七个网络结构来评估SSNS,这包括图像分类、双向图像-感应力检索和定位等不同任务,即使只使用1%的参数,我们也能够创建高性能模型。