Steering language generation towards objectives or away from undesired content has been a long-standing goal in utilizing language models (LM). Recent work has demonstrated reinforcement learning and weighted decoding as effective approaches to achieve a higher level of language control and quality with pros and cons. In this work, we propose a novel critic decoding method for controlled language generation (CriticControl) that combines the strengths of reinforcement learning and weighted decoding. Specifically, we adopt the actor-critic framework to train an LM-steering critic from non-differentiable reward models. And similar to weighted decoding, our method freezes the language model and manipulates the output token distribution using called critic, improving training efficiency and stability. Evaluation of our method on three controlled generation tasks, namely topic control, sentiment control, and detoxification, shows that our approach generates more coherent and well-controlled texts than previous methods. In addition, CriticControl demonstrates superior generalization ability in zero-shot settings. Human evaluation studies also corroborate our findings.
翻译:在使用语言模式(LM)方面,一个长期的目标就是为实现目标或远离不受欢迎的内容而创造指导语言。最近的工作表明,强化学习和加权解码是提高语言控制水平、提高优缺点和质量的有效方法。在这项工作中,我们提议为有控制的语言生成(Crital Control)提出一个新的批评解码方法,将强化学习和加权解码的优点结合起来。具体地说,我们采用行为体-批评框架,从不可区分的奖励模式中培训一名LM-Steerger评论家。类似加权解码,我们的方法冻结了语言模式,并用所谓的批评者、提高培训效率和稳定性来操纵输出符号分发。对我们三种有控制的生成任务(即主题控制、情绪控制和解毒)方法的评价表明,我们的方法比以往方法更一致、更受控制的文本。此外,Critic control展示了在零发效果环境中的高级通用能力。人类评价研究也证实了我们的调查结果。