The remarkable progress of Large Language Models (LLMs) presents promising opportunities for Verilog code generation which is significantly important for automated circuit design. The lacking of meaningful functional rewards hinders the preference optimization based on Reinforcement Learning (RL) for producing functionally correct Verilog code. In this paper, we propose Signal-Aware Learning for Verilog code generation (QiMeng-SALV) by leveraging code segments of functionally correct output signal to optimize RL training. Considering Verilog code specifies the structural interconnection of hardware gates and wires so that different output signals are independent, the key insight of QiMeng-SALV is to extract verified signal-aware implementations in partially incorrect modules, so as to enhance the extraction of meaningful functional rewards. Roughly, we verify the functional correctness of signals in generated module by comparing with that of reference module in the training data. Then abstract syntax tree (AST) is employed to identify signal-aware code segments which can provide meaningful functional rewards from erroneous modules. Finally, we introduce signal-aware DPO which is optimized on the correct signal-level code segments, thereby preventing noise and interference from incorrect signals. The proposed QiMeng-SALV underscores the paradigm shift from conventional module-level to fine-grained signal-level optimization in Verilog code generation, addressing the issue of insufficient functional rewards. Experiments demonstrate that our method achieves state-of-the-art performance on VerilogEval and RTLLM, with a 7B parameter model matching the performance of the DeepSeek v3 671B model and significantly outperforming the leading open-source model CodeV trained on the same dataset. Our code is available at https://github.com/zy1xxx/SALV.
翻译:大型语言模型(LLM)的显著进展为Verilog代码生成带来了广阔前景,这对自动化电路设计至关重要。然而,缺乏有意义的功能性奖励阻碍了基于强化学习(RL)的偏好优化,难以生成功能正确的Verilog代码。本文提出面向Verilog代码生成的信号感知学习方法(QiMeng-SALV),通过利用功能正确的输出信号对应的代码片段来优化RL训练。考虑到Verilog代码规定了硬件门电路与连线的结构互连,且不同输出信号相互独立,QiMeng-SALV的核心思想是从部分错误的模块中提取已验证的信号感知实现,从而增强对有意义功能性奖励的提取。具体而言,我们通过对比训练数据中参考模块的信号功能,验证生成模块中各信号的功能正确性。随后,利用抽象语法树(AST)从错误模块中识别能够提供有意义功能性奖励的信号感知代码片段。最后,我们提出信号感知DPO方法,该方法基于正确的信号级代码片段进行优化,从而避免错误信号带来的噪声与干扰。所提出的QiMeng-SALV强调了Verilog代码生成从传统模块级优化向细粒度信号级优化的范式转变,解决了功能性奖励不足的问题。实验表明,我们的方法在VerilogEval和RTLLM基准上取得了最先进的性能,仅使用70亿参数的模型即可匹配DeepSeek v3 6710亿参数模型的性能,并显著优于在同一数据集上训练的开源领先模型CodeV。代码已发布于https://github.com/zy1xxx/SALV。