Automated code review adoption lags in compliance-heavy settings, where static analyzers produce high-volume, low-rationale outputs, and naive LLM use risks hallucination and incurring cost overhead. We present a production system for grounded, PR-native review that pairs static-analysis findings with AST-guided context extraction and a single-GPU, on-demand serving stack (quantized open-weight model, multi-tier caching) to deliver concise explanations and remediation guidance. Evaluated on safety-oriented C/C++ standards, the approach achieves sub-minute median first-feedback (offline p50 build+LLM 59.8s) while maintaining competitive violation reduction and lower violation rates versus larger proprietary models. The architecture is decoupled: teams can adopt the grounding/prompting layer or the serving layer independently. A small internal survey (n=8) provides directional signals of reduced triage effort and moderate perceived grounding, with participants reporting fewer human review iterations. We outline operational lessons and limitations, emphasizing reproducibility, auditability, and pathways to broader standards and assisted patching.
翻译:在合规要求严格的环境中,自动化代码审查的采用相对滞后,其中静态分析器产生大量但缺乏合理解释的输出,而直接使用大型语言模型则存在幻觉风险和成本开销。我们提出一个用于接地的、面向拉取请求的审查生产系统,该系统将静态分析结果与抽象语法树引导的上下文提取相结合,并采用单GPU按需服务栈(量化开放权重模型、多层缓存),以提供简洁的解释和修复指导。在面向安全的C/C++标准上评估,该方法实现了中位数低于一分钟的首次反馈时间(离线p50构建+LLM 59.8秒),同时在违规减少和违规率方面与更大的专有模型相比保持竞争力且表现更优。该架构采用解耦设计:团队可独立采用接地/提示层或服务层。一项小规模内部调查(n=8)提供了减少分类工作量和中等感知接地性的方向性信号,参与者报告人工审查迭代次数减少。我们概述了运营经验与局限,强调可复现性、可审计性以及向更广泛标准和辅助补丁的扩展路径。