VVS：通过部分验证跳过加速视觉自回归生成的推测解码 (VVS: Accelerating Speculative Decoding for Visual Autoregressive Generation via Partial Verification Skipping)

Visual autoregressive (AR) generation models have demonstrated strong potential for image generation, yet their next-token-prediction paradigm introduces considerable inference latency. Although speculative decoding (SD) has been proven effective for accelerating visual AR models, its "draft one step, then verify one step" paradigm prevents a direct reduction of the forward passes, thus restricting acceleration potential. Motivated by the visual token interchangeability, we for the first time to explore verification skipping in the SD process of visual AR model generation to explicitly cut the number of target model forward passes, thereby reducing inference latency. Based on an analysis of the drafting stage's characteristics, we observe that verification redundancy and stale feature reusability are key factors to retain generation quality and speedup for verification-free steps. Inspired by these two observations, we propose a novel SD framework VVS to accelerate visual AR generation via partial verification skipping, which integrates three complementary modules: (1) a verification-free token selector with dynamical truncation, (2) token-level feature caching and reuse, and (3) fine-grained skipped step scheduling. Consequently, VVS reduces the number of target model forward passes by a factor of $2.8\times$ relative to vanilla AR decoding while maintaining competitive generation quality, offering a superior speed-quality trade-off over conventional SD frameworks and revealing strong potential to reshape the SD paradigm.

翻译：视觉自回归（AR）生成模型在图像生成方面展现出强大潜力，但其下一令牌预测范式引入了显著的推理延迟。尽管推测解码（SD）已被证明能有效加速视觉AR模型，但其“起草一步，验证一步”的范式阻碍了前向传递次数的直接减少，从而限制了加速潜力。基于视觉令牌的可互换性，我们首次探索在视觉AR模型生成的SD过程中跳过验证，以显式削减目标模型前向传递次数，从而降低推理延迟。通过分析起草阶段的特性，我们观察到验证冗余和过时特征可重用性是保持无验证步骤生成质量和加速的关键因素。受这两个观察启发，我们提出了一种新颖的SD框架VVS，通过部分验证跳过来加速视觉AR生成，该框架整合了三个互补模块：（1）具有动态截断的无验证令牌选择器，（2）令牌级特征缓存与重用，以及（3）细粒度的跳过步骤调度。因此，VVS将目标模型前向传递次数相对于原始AR解码减少了$2.8\times$倍，同时保持有竞争力的生成质量，在传统SD框架上提供了更优的速度-质量权衡，并展现出重塑SD范式的强大潜力。