ADAPT：面向预算约束指令微调的任务混合学习算法 (ADAPT: Learning Task Mixtures for Budget-Constrained Instruction Tuning)

We propose ADAPT, a meta-learning algorithm that \emph{learns} task sampling proportions under an explicit token budget for multi-task instruction tuning. Instead of fixing task weights by hand, \adapt{} maintains a continuous distribution over tasks and updates it via meta-gradients of a smooth worst-case validation objective, inducing an adaptive curriculum that allocates more tokens to useful tasks while avoiding collapse. We instantiate ADAPT on three $\sim$1B-parameter open-weight LLMs (Gemma-3-1B, LLaMA-3.2-1B, Qwen-0.6B), training on 20 Natural Instructions task types under budgets of $1\%$, $5\%$, and $10\%$ of the available supervised tokens, and compare against strong supervised fine-tuning baselines with uniform and size-proportional mixing. We conduct evaluations on 11 out-of-domain benchmarks spanning reasoning, reading comprehension, code generation, and instruction following, we find that ADAPT matches or slightly improves average downstream performance relative to the best static mixture, while using fewer effective training tokens and reallocating budget toward harder, benchmark-aligned tasks.

翻译：我们提出ADAPT，一种元学习算法，能够在明确的令牌预算下学习多任务指令微调的任务采样比例。与手动固定任务权重不同，ADAPT通过维护任务的连续分布，并利用平滑最坏情况验证目标的元梯度进行更新，从而构建一种自适应课程策略，将更多令牌分配给有效任务，同时避免训练崩溃。我们在三个约10亿参数的开源大语言模型（Gemma-3-1B、LLaMA-3.2-1B、Qwen-0.6B）上实例化ADAPT，在可用监督令牌的1%、5%和10%预算下对20种自然指令任务类型进行训练，并与采用均匀混合和规模比例混合的强监督微调基线进行比较。通过对涵盖推理、阅读理解、代码生成和指令遵循的11个领域外基准进行评估，我们发现ADAPT在减少有效训练令牌使用量的同时，将预算重新分配给更困难且与基准对齐的任务，其平均下游性能与最佳静态混合方法相当或略有提升。