Recent work has framed constrained text generation with autoregressive language models as a probabilistic inference problem. Among these, Zhao et al. (2024) introduced a promising approach based on twisted Sequential Monte Carlo, which incorporates learned twist functions and twist-induced proposals to guide the generation process. However, in constrained generation settings where the target distribution concentrates on outputs that are unlikely under the base model, learning becomes challenging due to sparse and uninformative reward signals. We show that iteratively refining the base model through self-distillation alleviates this issue by making the model progressively more aligned with the target, leading to substantial gains in generation quality.
翻译:近期研究将基于自回归语言模型的受限文本生成问题构建为概率推断问题。其中,Zhao等人(2024)提出了一种基于扭曲序列蒙特卡洛的有前景方法,该方法通过学习的扭曲函数和扭曲诱导的提议分布来引导生成过程。然而,在目标分布集中于基础模型低概率输出的受限生成场景中,由于稀疏且信息量不足的奖励信号,学习过程变得极具挑战性。我们证明,通过自蒸馏迭代优化基础模型可以缓解这一问题,使模型逐步与目标分布对齐,从而显著提升生成质量。