基于现货市场预测的LLM微调截止时间感知在线调度 (Deadline-Aware Online Scheduling for LLM Fine-Tuning with Spot Market Predictions)

As foundation models grow in size, fine-tuning them becomes increasingly expensive. While GPU spot instances offer a low-cost alternative to on-demand resources, their volatile prices and availability make deadline-aware scheduling particularly challenging. We tackle this difficulty by using a mix of spot and on-demand instances. Distinctively, we show the predictability of prices and availability in a spot instance market, the power of prediction in enabling cost-efficient scheduling and its sensitivity to estimation errors. An integer programming problem is formulated to capture the use of mixed instances under both the price and availability dynamics. We propose an online allocation algorithm with prediction based on the committed horizon control approach that leverages a \emph{commitment level} to enforce the partial sequence of decisions. When this prediction becomes inaccurate, we further present a complementary online algorithm without predictions. An online policy selection algorithm is developed that learns the best policy from a pool constructed by varying the parameters of both algorithms. We prove that the prediction-based algorithm achieves tighter performance bounds as prediction error decreases, while the policy selection algorithm possesses a regret bound of $\mathcal{O}(\sqrt{T})$. Experimental results demonstrate that our online framework can adaptively select the best policy under varying spot market dynamics and prediction quality, consistently outperforming baselines and improving utility by up to 54.8\%.

翻译：随着基础模型规模不断增大，其微调成本日益高昂。GPU现货实例虽为按需资源提供了一种低成本替代方案，但其波动的价格与可用性使得截止时间感知的调度尤为困难。我们通过混合使用现货实例与按需实例来应对这一挑战。本文创新性地揭示了现货市场价格与可用性的可预测性，论证了预测能力在实现高性价比调度中的关键作用及其对估计误差的敏感性。我们构建了一个整数规划问题，以刻画在价格与可用性双重动态变化下混合实例的使用场景。基于承诺区间控制方法，我们提出一种带有预测的在线分配算法，该算法通过引入\emph{承诺水平}来强制执行部分决策序列。当预测失准时，我们进一步提出一种无需预测的补充在线算法。此外，我们开发了一种在线策略选择算法，该算法能够从通过调整两种算法参数构建的策略池中学习最优策略。我们证明，基于预测的算法在预测误差减小时可获得更紧的性能界，而策略选择算法则具有$\mathcal{O}(\sqrt{T})$的遗憾界。实验结果表明，我们的在线框架能够在变化的现货市场动态与预测质量下自适应选择最优策略，其性能持续超越基线方法，并将效用提升最高达54.8\%。