In this work, we show that the impact of model capacity varies across timesteps: it is crucial for the early and late stages but largely negligible during the intermediate stage. Accordingly, we propose FlowBlending, a stage-aware multi-model sampling strategy that employs a large model and a small model at capacity-sensitive stages and intermediate stages, respectively. We further introduce simple criteria to choose stage boundaries and provide a velocity-divergence analysis as an effective proxy for identifying capacity-sensitive regions. Across LTX-Video (2B/13B) and WAN 2.1 (1.3B/14B), FlowBlending achieves up to 1.65x faster inference with 57.35% fewer FLOPs, while maintaining the visual fidelity, temporal coherence, and semantic alignment of the large models. FlowBlending is also compatible with existing sampling-acceleration techniques, enabling up to 2x additional speedup. Project page is available at: https://jibin86.github.io/flowblending_project_page.
翻译:本研究表明,模型容量在不同时间步的影响存在差异:在早期和后期阶段至关重要,而在中间阶段基本可忽略。据此,我们提出FlowBlending——一种阶段感知的多模型采样策略,分别在容量敏感阶段和中间阶段采用大模型和小模型。我们进一步引入简单的阶段边界选择准则,并提供速度散度分析作为识别容量敏感区域的有效代理方法。在LTX-Video(2B/13B)和WAN 2.1(1.3B/14B)上的实验表明,FlowBlending在保持大模型视觉保真度、时序连贯性和语义对齐能力的同时,实现了最高1.65倍的推理加速和57.35%的FLOPs减少。该方法还可与现有采样加速技术兼容,实现最高2倍的额外加速。项目页面详见:https://jibin86.github.io/flowblending_project_page。