Hound introduces a relation-first graph engine that improves system-level reasoning across interrelated components in complex codebases. The agent designs flexible, analyst-defined views with compact annotations (e.g., monetary/value flows, authentication/authorization roles, call graphs, protocol invariants) and uses them to anchor exact retrieval: for any question, it loads precisely the code that matters (often across components) so it can zoom out to system structure and zoom in to the decisive lines. A second contribution is a persistent belief system: long-lived vulnerability hypotheses whose confidence is updated as evidence accrues. The agent employs coverage-versus-intuition planning and a QA finalizer to confirm or reject hypotheses. On a five-project subset of ScaBench[1], Hound improves recall and F1 over a baseline LLM analyzer (micro recall 31.2% vs. 8.3%; F1 14.2% vs. 9.8%) with a modest precision trade-off. We attribute these gains to flexible, relation-first graphs that extend model understanding beyond call/dataflow to abstract aspects, plus the hypothesis-centric loop; code and artifacts are released to support reproduction.
翻译:Hound 提出了一种关系优先的图引擎,用于提升对复杂代码库中相互关联组件的系统级推理能力。该智能体通过简洁的注解(例如,资金/价值流、认证/授权角色、调用图、协议不变量)设计灵活的分析师自定义视图,并利用这些视图进行精确检索:针对任何问题,它都能精确加载相关的代码(通常跨越多个组件),从而既能宏观审视系统结构,又能聚焦于关键代码行。第二个贡献是一个持久信念系统:该系统维护长期存在的漏洞假设,其置信度会随着证据的积累而更新。该智能体采用覆盖度与直觉权衡的规划策略以及一个问答终结器来确认或拒绝假设。在 ScaBench[1] 的五项目子集上,Hound 相较于基线 LLM 分析器,在召回率和 F1 分数上均有提升(微观召回率 31.2% 对比 8.3%;F1 分数 14.2% 对比 9.8%),同时精度略有折损。我们将这些提升归因于灵活的关系优先图谱,它将模型的理解从调用/数据流扩展到了抽象层面,加之以假设为中心的循环;代码与相关制品已开源以支持复现。