Gradient-based multilevel optimization (MLO) has gained attention as a framework for studying numerous problems, ranging from hyperparameter optimization and meta-learning to neural architecture search and reinforcement learning. However, gradients in MLO, which are obtained by composing best-response Jacobians via the chain rule, are notoriously difficult to implement and memory/compute intensive. We take an initial step towards closing this gap by introducing Betty, a software library for large-scale MLO. At its core, we devise a novel dataflow graph for MLO, which allows us to (1) develop efficient automatic differentiation for MLO that reduces the computational complexity from O(d^3) to O(d^2), (2) incorporate systems support such as mixed-precision and data-parallel training for scalability, and (3) facilitate implementation of MLO programs of arbitrary complexity while allowing a modular interface for diverse algorithmic and systems design choices. We empirically demonstrate that Betty can be used to implement an array of MLO programs, while also observing up to 11% increase in test accuracy, 14% decrease in GPU memory usage, and 20% decrease in training wall time over existing implementations on multiple benchmarks. We also showcase that Betty enables scaling MLO to models with hundreds of millions of parameters. We open-source the code at https://github.com/leopard-ai/betty.
翻译:作为研究从超参数优化和元学习到神经结构搜索和强化学习等诸多问题的框架,基于渐进的多层次优化(MLO)已获得关注,成为研究许多问题的框架,从超参数优化和元学习到神经结构搜索和强化学习。然而,通过链条规则将最佳反应的雅各布人(Jacobians)组成为最佳反应,由此获得的移动式(MLO)的梯度是众所周知的,很难执行和记忆/计算密集的。我们为缩小这一差距迈出了第一步,引进了大型移动模型软件库(Betty),这是大型移动模型软件库的软件库。我们设计了一个全新的数据流图的核心,它使我们能够(1)为移动模型制定有效的自动区分,降低计算复杂性,从O(d)3到O(d)2),(2)纳入系统支持,例如混合精度和数据单程培训以利缩放,以及(3)促进任意复杂性方案的实施,同时允许为多种算法和系统设计选择的模块接口。我们的经验证明,贝蒂可以用来执行一系列MLO程序,同时观测测试精度增加11%,GPU记忆使用率减少14%,同时减少20%,并展示数以百万个模型显示数以至百万个基准的训练的模型。我们还可以/数以显示数以百万计数制模模模模模模模模模模模模的模型。</s>