QiMeng-CodeV-R1：推理增强的Verilog生成 (QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation)

Yaoyu Zhu,Di Huang,Hanqi Lyu,Xiaoyun Zhang,Chongxiao Li,Wenxuan Shi,Yutong Wu,Jianan Mu,Jinghua Wang,Yang Zhao,Pengwei Jin,Shuyao Cheng,Shengwen Liang,Xishan Zhang,Rui Zhang,Zidong Du,Qi Guo,Xing Hu,Yunji Chen

Large language models (LLMs) trained via reinforcement learning with verifiable reward (RLVR) have achieved breakthroughs on tasks with explicit, automatable verification, such as software programming and mathematical problems. Extending RLVR to electronic design automation (EDA), especially automatically generating hardware description languages (HDLs) like Verilog from natural-language (NL) specifications, however, poses three key challenges: the lack of automated and accurate verification environments, the scarcity of high-quality NL-code pairs, and the prohibitive computation cost of RLVR. To this end, we introduce CodeV-R1, an RLVR framework for training Verilog generation LLMs. First, we develop a rule-based testbench generator that performs robust equivalence checking against golden references. Second, we propose a round-trip data synthesis method that pairs open-source Verilog snippets with LLM-generated NL descriptions, verifies code-NL-code consistency via the generated testbench, and filters out inequivalent examples to yield a high-quality dataset. Third, we employ a two-stage "distill-then-RL" training pipeline: distillation for the cold start of reasoning abilities, followed by adaptive DAPO, our novel RLVR algorithm that can reduce training cost by adaptively adjusting sampling rate. The resulting model, CodeV-R1-7B, achieves 68.6% and 72.9% pass@1 on VerilogEval v2 and RTLLM v1.1, respectively, surpassing prior state-of-the-art by 12~20%, while even exceeding the performance of 671B DeepSeek-R1 on RTLLM. We have released our model, training code, and dataset to facilitate research in EDA and LLM communities.

翻译：通过可验证奖励的强化学习（RLVR）训练的大语言模型（LLM）已在具有明确、可自动化验证的任务上取得突破，例如软件编程和数学问题。然而，将RLVR扩展到电子设计自动化（EDA）领域，特别是从自然语言（NL）规范自动生成硬件描述语言（HDL）如Verilog，面临三个关键挑战：缺乏自动化且准确的验证环境、高质量NL-代码对的稀缺性，以及RLVR高昂的计算成本。为此，我们提出了CodeV-R1，一个用于训练Verilog生成LLM的RLVR框架。首先，我们开发了一个基于规则的测试平台生成器，可针对黄金参考进行鲁棒的等价性检查。其次，我们提出了一种往返数据合成方法，将开源Verilog代码片段与LLM生成的NL描述配对，通过生成的测试平台验证代码-NL-代码的一致性，并过滤掉不等价的样本，从而产生高质量的数据集。第三，我们采用了两阶段“蒸馏后强化学习”的训练流程：首先通过蒸馏实现推理能力的冷启动，随后采用我们新颖的RLVR算法——自适应DAPO，该算法可通过自适应调整采样率来降低训练成本。最终得到的模型CodeV-R1-7B，在VerilogEval v2和RTLLM v1.1上分别达到了68.6%和72.9%的pass@1，比先前最先进模型高出12~20%，甚至在RTLLM上的表现超过了671B参数的DeepSeek-R1。我们已经发布了模型、训练代码和数据集，以促进EDA和LLM社区的研究。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日