The widespread adoption of large language models (LLMs) across industries has increased the demand for high-quality and customizable outputs. However, traditional alignment methods often require retraining large pretrained models, making it difficult to quickly adapt and optimize LLMs for diverse applications. To address this limitation, we propose a novel \textit{Residual Alignment Model} (\textit{RAM}) that formalizes the alignment process as a type of importance sampling. In this framework, the unaligned upstream model serves as the proposal distribution, while the alignment process is framed as secondary sampling based on an autoregressive alignment module that acts as an estimator of the importance weights. This design enables a natural detachment of the alignment module from the target aligned model, improving flexibility and scalability. Based on this model, we derive an efficient sequence-level training strategy for the alignment module, which operates independently of the proposal module. Additionally, we develop a resampling algorithm with iterative token-level decoding to address the common first-token latency issue in comparable methods. Experimental evaluations on two leading open-source LLMs across diverse tasks, including instruction following, domain adaptation, and preference optimization, demonstrate that our approach consistently outperforms baseline models.
翻译:随着大语言模型在各行业的广泛应用,对高质量、可定制化输出的需求日益增长。然而,传统的对齐方法通常需要对大规模预训练模型进行重新训练,这导致难以针对多样化应用场景快速适配和优化大语言模型。为解决这一局限,我们提出一种新颖的\textit{残差对齐模型}(\textit{RAM}),将对齐过程形式化为一种重要性采样。在此框架中,未对齐的上游模型充当提议分布,而对齐过程则被构建为基于自回归对齐模块的二次采样,该模块作为重要性权重的估计器。这种设计使得对齐模块能够自然地与目标对齐模型分离,从而提升灵活性与可扩展性。基于该模型,我们推导出一种针对对齐模块的高效序列级训练策略,该策略独立于提议模块运行。此外,我们开发了一种具有迭代词元级解码的重采样算法,以解决同类方法中常见的首词元延迟问题。在两个领先的开源大语言模型上进行的多任务实验评估(包括指令遵循、领域适配和偏好优化)表明,我们的方法在各项任务中均持续优于基线模型。