DRS-OSS：基于大语言模型的差异风险评分工具用于PR风险预测 (DRS-OSS: LLM-Driven Diff Risk Scoring Tool for PR Risk Prediction)

from arxiv, 8 pages, 4 figures, includes system architecture diagrams, Web UI screenshots, GitHub App examples, and an appendix with API endpoints. Full replication package and demo materials available

In large-scale open-source projects, hundreds of pull requests land daily, each a potential source of regressions. Diff Risk Scoring (DRS) estimates the likelihood that a diff will introduce a defect, enabling better review prioritization, test planning, and CI/CD gating. We present DRS-OSS, an open-source DRS system equipped with a public API, web UI, and GitHub plugin. DRS-OSS uses a fine-tuned Llama 3.1 8B sequence classifier trained on the ApacheJIT dataset, consuming long-context representations that combine commit messages, structured diffs, and change metrics. Through parameter-efficient adaptation, 4-bit QLoRA, and DeepSpeed ZeRO-3 CPU offloading, we train 22k-token contexts on a single 20 GB GPU. On the ApacheJIT benchmark, DRS-OSS achieves state-of-the-art performance (F1 = 0.64, ROC-AUC = 0.89). Simulations show that gating only the riskiest 30% of commits can prevent up to 86.4% of defect-inducing changes. The system integrates with developer workflows through an API gateway, a React dashboard, and a GitHub App that posts risk labels on pull requests. We release the full replication package, fine-tuning scripts, deployment artifacts, code, demo video, and public website.

翻译：在大型开源项目中，每天有数百个拉取请求被合并，每个请求都可能引入回归问题。差异风险评分（DRS）用于评估代码差异引入缺陷的可能性，从而实现更优的代码审查优先级排序、测试计划制定以及CI/CD门控。本文提出DRS-OSS，这是一个配备公共API、Web界面和GitHub插件的开源DRS系统。DRS-OSS采用基于ApacheJIT数据集微调的Llama 3.1 8B序列分类器，该分类器处理结合提交信息、结构化差异和变更指标的长上下文表示。通过参数高效适配、4位QLoRA量化和DeepSpeed ZeRO-3 CPU卸载技术，我们在单块20GB GPU上完成了22k令牌上下文的训练。在ApacheJIT基准测试中，DRS-OSS实现了最先进的性能（F1分数=0.64，ROC-AUC=0.89）。模拟实验表明，仅对风险最高的30%提交进行门控即可阻止高达86.4%的缺陷引入变更。该系统通过API网关、React仪表板和能在拉取请求上标注风险标签的GitHub应用集成到开发者工作流中。我们发布了完整的复现包、微调脚本、部署构件、代码、演示视频和公共网站。