审计低语：检测多智能体大语言模型中的隐写合谋 (Audit the Whisper: Detecting Steganographic Collusion in Multi-Agent LLMs)

Multi-agent deployments of large language models (LLMs) are increasingly embedded in market, allocation, and governance workflows, yet covert coordination among agents can silently erode trust and social welfare. Existing audits are dominated by heuristics that lack theoretical guarantees, struggle to transfer across tasks, and seldom ship with the infrastructure needed for independent replication. We introduce Audit the Whisper, a conference-grade research artifact that spans theory, benchmark design, detection, and reproducibility. Our contributions are: (i) a channel-capacity analysis showing how interventions such as paraphrase, rate limiting, and role permutation impose quantifiable capacity penalties-operationalised via paired-run Kullback--Leibler diagnostics-that tighten mutual-information thresholds with finite-sample guarantees and full proofs; (ii) ColludeBench-v0, covering pricing, first-price auctions, peer review, and hosted Gemini/Groq APIs with configurable covert schemes, deterministic manifests, and reward instrumentation; and (iii) a calibrated auditing pipeline that fuses cross-run mutual information, permutation invariance, watermark variance, and fairness-aware acceptance bias, each tuned to a $10^{-3}$ false-positive budget and validated by 10k honest runs plus an e-value martingale. Across ColludeBench and external suites including Secret Collusion, CASE, Perfect Collusion Benchmark, and SentinelAgent, the union meta-test attains state-of-the-art power at fixed FPR while ablations surface price-of-auditing trade-offs and fairness-driven colluders invisible to MI alone. We release regeneration scripts, anonymized manifests, and documentation so that external auditors can reproduce every figure, satisfy double-blind requirements, and extend the framework with minimal effort.

翻译：多智能体大语言模型（LLM）部署日益融入市场、资源配置与治理工作流，然而智能体间的隐蔽协调可能悄然侵蚀信任与社会福利。现有审计方法多依赖启发式规则，缺乏理论保证，难以跨任务迁移，且鲜少配备支持独立复现的基础设施。我们提出"审计低语"——一个涵盖理论、基准设计、检测与可复现性的会议级研究工具。我们的贡献包括：（i）通过信道容量分析，量化展示释义干预、速率限制与角色置换等策略如何施加可度量的容量惩罚（通过配对运行的Kullback-Leibler诊断实现操作化），从而在有限样本保证与完整证明下收紧互信息阈值；（ii）ColludeBench-v0基准，涵盖定价、首价拍卖、同行评审及托管式Gemini/Groq API，支持可配置的隐蔽合谋方案、确定性清单与奖励机制；（iii）校准化审计流程，融合跨运行互信息、置换不变性、水印方差与公平性感知的接受偏差等指标，各项指标均以$10^{-3}$假阳性率为预算进行调优，并通过万次诚实运行及e值鞅过程验证。在ColludeBench及外部测试集（包括Secret Collusion、CASE、Perfect Collusion Benchmark与SentinelAgent）的联合元测试中，本方法在固定假阳性率下达到最优检测效能，消融实验则揭示了审计代价权衡机制以及仅靠互信息无法识别的公平驱动型合谋者。我们公开了复现脚本、匿名化清单及完整文档，使外部审计者能够复现所有图表、满足双盲评审要求，并以最小成本扩展本框架。